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ASYMPTOTICALLY MOST POWERFUL TESTS OF STATISTICAL 
^ HYPOTHESES' 

By Abraham Wald* 

Columbia University^ New York City 

1. Introdiiction. Let f(Xf 6) be the probability density function of a variate 
X involving an unknown parameter For testing the h 3 rpothe 8 i 8 ^ by 
means of n independent observations xi, ••• , Xn on x we have to choose a region 
of rejection Wn in the n-dimensional sample space. Denote by P(Wn 1 0) the 
probability that the sample point jB = (xi , • • * , Xn) will fall in Wn tmder the 
assumption that $ is the true value of the parameter. For any region t7» of 
the n-dimensional sample space denote by g(Un) the greatest lower bound of 
P(Un I 0). For any pair of regions Un and Tn denote by L(Un , Tn) the least 
upper bound of 

P(Un 1 e) ~ PiTn I e). 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

Definition 1. A sequence {TTn} , (n = 1, 2, • • * , ad inf.), of regions is said to 
be an asymptotically most powerful test of the hypothesis 6 ^ Boon the level of 
significance a if P{Wn | ^o) = a and if for any sequence \Zn] of regions for 
which P(Zn I Bo) = a, the inequality 

lim sup L(Zn , Tr«) < 0 

n-*«o 

holds. 

Definition 2. A sequence {TFn}, (n = 1, 2, • , ad inf.), of regions is said 

to be an asymptotically most powerful unbiased test of the hypothesis B ^ Bo 
on the level of significance a if P{Wn | ^o) == lim fif(TFn) == a, and if for any se- 

quence [Zn] of regions for which P(Z» ] ^o) = lim g{Zr) = a, the inequality 

n—oo 

lim sup L(Z» , Tr«) < 0 

n-*«o 

holds. 

Let 6n{xi , • • • , Xn) be the maximum likelihood estimate of ^ in the n-dimen- 
signal sample st)ace. That is to say, K{x\ , ••• , Xn) denotes the value of B 


^ Presented to the American Mathematical Society at New York, February 24, 1940, 
* Research under a grant-in-aid from the Carnegie Corporation of New York. 
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for which the product JJ f{x ^ , d) becomes a maximum. Let Wh be the region 

ra>l 

de^cd by the inequality \/«(^n ~ ®o) > c'„ , W" defined by the inequality 
\/n(d„ — ffo) < c'„', and let W„ consists of all points for which at least one of 
the inequalities 

\/n(^n - 6o) > o„ , \/n(^„ — 0o)0 - On 

is satisfied. The constants On , cl , Cn are chosen such that 

P{w: 1 So) = P{W': 1 ^o) = P{Wn 1 So) = a. 

It will be shown in this paper that under certain restrictions on the probability 
density /(a?, d) the sequence [Wn 1 is an asymptotically most powerful test of the 
hypothesis 6 So if 6 takes only values 6 > So » Similarly \ Wn \ is an asymp- 
totically most powerful test if S takes only values ^ . Finally | Wn ( is an 

asymptotically most powerful unbiased test if 6 can take any real value. 

2. Assumptions on the density function /(x, S), 

Assumption 1. For any positive k 

lim P{—k < 6n -- S < k\6) = 1 

n-»oo 

uniformly in where P( — k<dn — S<k\d) denotes the probability that —k < 
dn S < k under the assumption that 6 is the true value of the parameter, ^ 
Assumption 1 implies somewhat more than consistency of the maximum like- 
lihood estimate K . In fact, consistency means only that for any positive k 

lim Pi’-k < — S < k\$) = 1, 

without asking that the convergence should be uniform in 6, If dn satisfies 
Assumption 1 we shall say that dn is a uniformly consistent estimate of 9, A 
rigorous proof of the consistency of Sn (under certain restrictions on /(x, $)) 
was given by J. L. Doob.* In an appendix to this paper it will be shown that 
under certain conditions Sn is uniformly consistent. • 

Denote by JS?^[^(x)] the expected value of ^(x) under the assumption that S 
is the true value of the parameter. That is to say, ^ 

Eelikix)] ^ [ ik(x)f(x, S) dx, 

J—tO 

For any x, for any positive d, and for any 0i , denote by ^i(x, di , S) the greatest 

lower bound, and by <piix, 0i , S) the least upper bound of in the 

oS^ 

interval Si 8 < S < Si + 6, 

Assumption 2, There exists a positive value h such that *ths expeciatidks 
Ee(pi{Xj Sif 8) and E$<p 2 {Xf Si , 8) exist and are continuous functions of S, Si and 8 


* J. L. Doob, ‘Trobability and statistics,'’ Trans. Am. Math. Soc.j Vol. 36 (1937). 
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in the domain D defined by the inequalities: 0 < 5 < Jfco , < 

$0 + ^ko t Bo — ko < 6 < 6o + ko • Furthermore the expectations E({<pi{Xj Bi , 5)]* 
and , 6)Y exist in D and have a finite upper hound in I>. 

Assumption 3. There exists a positive value fco such that 

for (9o-jfco <9<flo + Jfco. 

v—oo vB ^^—00 oB* 

Assumption 3 mean^imply that we may differentiate with respect to B under 
the integral sign. In fact 

f fix, B)dx ^ 1 

J—OO 

identically in B, Hence 

Differentiating under the integral sign, we obtain the relations in Assumption 3. 
Assumption 4. There exists a positive 17 and a positive ko such that 

aiog/(x,e) 

* d$ 

exists and has a finite upper bound in the interval Bo ko < B < Bo + ko . 

3. Some propositions. Denote \/n (Bn — B) by ZniB) and denote the proba- 
bility P[ZniB) < < I 0] by ^n{t, B). 

Proposition I. Withm the B-interval [Bo ■“ ih , ^0 + hko] ^nit, B) converges 
with n CO uniformly in t and B towards the cumulative normal distribution with 
zero mean and variance 

* 1 /p 3Mog/(i, e) 

-1/ E, — ^ 

Proof: In all that follows we assume that B takes only values in the interval 
[^0 — fcoS ^0 + fco], except when the contrary is explicitly stated. Furthermore 
we introduce the variable Bi and assume that takes only values in the interval 
[^1 iko , ^0 + ifco]. 

Because of Assumption 3 we have 


( 1 ) = 

Since 

aUogf{x,0) _ 1 d‘‘f(.x,0) _ 1 rd/(x,<»)T 

^ fix,0y ae* [f(x, 0 )Yl d0 J 

we get from Assumption 3 

„ raiog/(x, a)T „ a*iog/(x, fl) 

L da J * aa* ■ 


( 2 ) 
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Hence 

(3) 

Consider the Taylor expansion 


die) = -E, > 0- 

ou^ 


(4) 3 log/(x«, 0) _ ^ a log /(a:. , OQ V los/(a?a . ^0 


da i=i de ' m 


where lies in the interval [^i , ^]. Denote X) 


d log f{Xa , ^0 

de 

For ^ == dn the left hand side of (4) is equal to zero. Hence we have 

(5) vM) + [VniK - aoi \ Z g-l g g-fe t) = 0, 

n a 

or 

(6) j/,(a0 + 2„(a,) I E = 0. 

Let (?ii(ai) be the region defined by the inequality 

(7) - 


by y»(ai). 


a OU^ 


where v denotes a positive number less than the greatest lower bound of d(Bi), 
We shall prove that 


( 8 ) 


lim P[Qn{Bi) I ^i] = 1 


uniformly in 6i . Let tq be a positive number such that 

(9) E,, <piix, ai , to) - E,, < ^ , (*■ = 1, 2) 


for all values of B \ . Because of Assumption 2 such a to certainly exists. 
Denote by /2n(^i) the region defined by the inequality 

(10) I d„ - a, 1 < TO . 

On account of Assumption 1 

( 11 ) VimPlRMlBi] = 1 

uniformly in Bi . Since B' lies in the interval [Bi , dn], we have 

(12) I I < TO 


for all points in Rn(Bi). Hence at any point in i2n(^i) the inequality H 

^ log/(Xa , BO 

a—l 


( 13 ) E Vl(xa , a, , to) < z < E ^(x- , Ol , to) 

atr a-1 


holds. 
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Let 8n(0i) be defined by the inequality 

(14) ^ , To) — E§iipi(z, 6i , To) < 1 

71 a 


and ITnCdi) by the inequality 

(15) - ^ ^(t« , Bi , To) — E$iipi{x, $1 , To) < 

71 ^ 

On account of Assumption 2 we have 

(16) lim P[8M I fld = lim P[T.(flx) | ffj = 1 

n<««e 

uniformly in Bi . 

Denote by Un(Bi) the common part of the regions /Zn(^i), Sn(^i) and Tn(Bi). 
In UniSi) we have on account of (9), (14) and (15) 

(17) - Z , 0i , T.) - E,, < I- (f = 1, 2). 

71 a CFtr 

From this we obtain (7) because of (13). That is to say, the inequality (7) is 
valid everywhere in {7«(<9i). Since 

lim P[Un(ei)\ex\ = 1 


uniformly in $i , our statement about Qn(^i) is proved. From (6) and (7) we 
get that everywhere in Qn(Bi) the inequalities hold; 


(18) 


< Zn{0l) < 


d{ei) + V 


d(fli) 


if VniOi) > 0; 


m 


y»(0i) 

d{0i) + V 


> 2»(^l) 


yn(9l) 

- d{ei) - ^ 


if VniOi) < 0. 


Let zt{8i) be defined as follows: z*(fli) = 2»(ffi) at any point in On(®i)i and 
zt(0i) = yn(.6i)/d(0i) at any point outside 0»(ffi). 

On accoimt of (8) we obviously have 


(20) lim Plz;(flj) < < 1 ej — P[ 2 «(fli) < < 1 ej = 0 

n«"oo 


uniformly in t and 6i . 

From equation (1) it follows that E$i yn(0i) = 0. From Assumption 4 it follows 
on account of the general limit theorems 

( 21 ) 

imiformly in t and $i . Hence 


lim P 


~ y«(gi) 

.d(»i) 


< t 


»*]- 4 / 


w /■' 

2ir jLoe 




n~oo 
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unifonnly in I and Bi . Since v can be chosen arbitrarily small, we get easily 
from (18), (19), (20) and (21) 

(22) lim P 1^^ < f 1 tfi] - P[zn(fli) < f 1 Si: 0 

uniformly in t and 6i . Proposition 1 follows from (21) and (22). 

Proposition 2. Let { Wn] be a sequence of regions of siz^ a, i,e, P{Wn | Oq) =« a, 
and let Vn(z) be the region defined by the inequality 

(^n — Bo) Vn < z. 

Let Un(z) be the intersection of Vn(z) and Wn , cind denote P[Un{z) | Oo] by Fn(z), 
Denote furthermore P[Wn 1 + ^^/\/n] by (?(m, n). If Fn{z) converges to F(z) 

and if lim /zn = /i, then 

n*»oo 

(23) lim (?(Mn , n) = f dP(z) 

n«oo **-08 

where 

, /c aMog/(x, 0o) 

c--\fEe, ^ . 

Proof: First we show 

(24) f dFiz) = a. 

J—oo 

Denote P[Vn{z) | 0o] by ^n{z). On account of Proposition 1 <^n(2J) converges 
uniformly to the cumulative normal distribution yl/{z) with zero mean and 
variance c. It is obvious that 

(26) Fn{Z2) - Fn{Zi) < ^n(2^) ^n(Zi) fOT Z 2 > «! • 

Hence 

(26) F(z 2 ) - Fizi) < ^( 22 ) ~ ^(zi) for 22 > Zi . 

From (25) we get 

(27) [lim F„(z)] - F„(z) = a - F„(z) < 1 - «I-«(z). 

Hence 

(28) a - F(z) < 1 - ^(z). 

Since F»(z) < a and therefore also F(z) < a, we get from (28) 

0 < a - F(z) < 1 - i^(z). 


“ a. 


Hence 

(29) 


lim F(z) 
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Since Fn{z) < ^n(z)f we have F(z) < ^( 2 ), and therefore 
(30) lim F(z) = 0. 


The equation (24) follows from (29) and (30). 

It follows easily from (26) that the integral on the right hand side of the equa- 
tion (23) exists and is finite. 

Let us denote ^0 + Ain/\/ n by . Consider the Taylor expansions 


(31) 

and 


S ^Og f(Xa , ^ 0 ) = 10g/(Xa , d«) + (Bo - dn) X 4 l0g/(a:«, dn) 

a a a Ou 

+ i(®0 — dn)* S ^ lOg/(a;a, 0n) 
a CfU“ 


53 log f(Xa, 0n) = 53 10g/(j., dn) + («« “ 53 4 Jog/(x<,, d„) 

a a a Ov 

(32) 

+ J(^n — dn)^ 2 ^ 10g/(a:a, O 

where Bn lies in the interval [^o , dn] and Bn lies in the interval [Bn , dn]. Since 
6n is the maximum likelihood estimate, we get from (31) and (32) 

(33) 53 log/(x«, Oo) = 2 log/(a:., k) + §(«o - d.)* 53 ^ log/(a:a, e'„), 

a a a (ftf 

(34) 53 log/(a:a, «„) = 53 log/(x„, k) + i(®» “ ^»)* 53 ^ log/(x«, ff"). 

a a a OU 


Denote by a real variable which can take any value between — 2/i and +2/x. 
Denote by Rn the region defined by the inequality 

(35) I - 00 1 < n"*. 

From Proposition 1 it follows easily that 

(36) lim PiRn 1 00 + jS/ Vn) = 1 

uniformly in 0. Denote 2n“^ by r„ . Then for almost all n the following 
inequalities hold at any point in Rn : 

(37) ^ 1 ( 3 ^ 0 , Boj Tf^ ^ f(^€i } ^n) ^ ^(^a > ^0, “Tn), 

a a Ov a 

(38) ^l(3?a, ^Oj Tn) ^ 1^8 /(^a I ^n) ^ <^2(ira , > ^n). 

a a Otr a 

Denote by Sn the region in which (35), (37) and (38) simultaneously hold. It 
is obvious that 


lim P(Sn I ^0 + 0lVn) = 1 
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uniformly in ;9. Denote So + /3/\/n by 9«(j9). From Assumption 2 it follows 
easily that 

> So I Tn)l *2 < 

(39) lim -5 = ^ »«) = -T 2) 

n-« ( n j or c 

uniformly in /9. Furthermore the variance of 2Z ^ if ^^(/ 3 ) is the 

a W 

true value of the parameter S, converges to zero with n -+ * uniformly in |3. 
Hence a sequence {X*}, (n = 1, 2, • • • , ad inf.), of positive numbers can be 
given such that 

(40) lim X« = 0 

M-i-OO 

and 

(41) lim P[Tn I BM] = 1 
uniformly in /S, where the region Tn is defined by the inequality 

(42) I Z - + - < X„n”* (t = 1, 2). 

I o n c 

From (37) and (38) it follows that in the intersection Tl of Tn and Sn 

(43) 1 1 21 -L log/(a;«, ^n) + 7 < 

n a c 

and 

(44) I ^ 23 log/(x„, S") + 7 < Xnn"*. 

! n a otr c 

We get from (33), (34), (36), (43) and (44) that at any point in Tn 

(45) Z Sn) — Z Iog/(a;«, So) = ^ [(So — dnY — (Sn — dn)*] + X« , 

a a 

where | Xn | < pXn , and p denotes a constant not depending on n. 

On account of (36) and (41) we have 

(46) limP[r:i0n(^)] = 1 

n — 00 

uniformly in /3. 

Denote by Tn{z) the intersection of Un{z) (defined in Proposition 2) and Tn . 
Denote furthermore P[Tn(z) | So] by Ft{z). 

Since 

n[(So - dn)* - (Sn - dn)*] = n[(So - dnf “ (9o “ ^n + Mn/v^)’] 

= —iJhi + 2\^nnn0„ — So), 
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we get from (46) and (46) 

(47) Urn (p[T','(*) 1 flnl - r 


uniformly in 2 . It is obvious that 

(48) Urn {PlKiz) \d^-P lUn(z) 1 0»1 ) = 0 

• •-CO 


0 


uniformly in z. Hence we get from (47) 

(49) lim |p[f7,(2) 1 0„] - c-“'^-*"-‘>'‘dP:«)} = 0 

uniformly in z. It follows from (49) that for any positive L 

(50) lim |pil/n(L) I - P[f/„(-L) I fl„] - = 0. 

Since lim nn == m, lini [Ft(t) — Fn{t)] = 0 uniformly in t, and since lim Fn{t) = 

n""co n*»co 

F{t) uniformly in we get from (50) 

(51) lim {P[t/„(L) 1 0„] - P[r/n(-I-) !(?»])= e-*<'‘‘-*'‘'>'‘dP«). 

J— 

Now let us calculate the limit of P[Vn(z) | if w oo. The region Vn{z) is 
defined by the inequality 

(52) (d, - flo) Vn < z. 

This inequality can be written as follows: 

(53) (dn — 0„) y/n < z — itn > 

Since lim = n, we get on account of Proposition 1 


(64) 


lim P[(d„ - 0„)\/n < 2 - Mn I fln] = }— f e 

n— 00 \/ 27rc 




dt 


= 4= r 

‘\/2irc 




dt 


Hence 

(65) 


limP[Fn(2) |fl„] = —4 f e 

n— 00 ZttC 




dt 


uniformly in z. 

For any positive e let L, denote the positive number satisfying the condition: 


V2 


jL. r /■ ''* + /'*e“*“"'‘’*'‘d<l = L 

2irC J ^ 


( 56 ) 
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From (56) we easily get on account of (26) 

(57) 0< I 

J-OO •L-L, ^ 

Since the region Uniz^) — Un(zi) is a subset of Vn{z 2 ) ^ Vn(zi) for Z 2 > Zi , 
we have on account of (55) and (56) 

(58) lim sup 1 lP[t7„(®) ! e„] - P[U„{L.) 1 0,] + P[t7„(-L.) | 0j) | < ^. 

n-*oo A 

Since 

/^[i^(«) I ^nl = (?(Mn , n), 

we have 

(59) lim sup I G(m„, n) - lP[t/„(L.) ! 0„1 - P[l/n(-L.) | Aj} | < ‘ . 

n-*oo ^ 

From (51), (57) and (59) we get 

(60) lim sup G(p„, n) - f “ dF{t) | < e. 

n-*oo 00 I 

Since e can be chosen arbitrarily small, Proposition 2 is proved. 

4. Theorems on asymptotically most powerful tests. 

Theorem 1 : Let Mn be the region defined by the inequality y/ n {Bn — B^) > Anf 
where An is chosen such that P{Mn | Bo) = a. Then \Mn} is an asymptotically 
most powerful test of the hypothesis B = Bo ^ provided the parameter B is restricted 
to values > Bo . 

Proof: Assume that there exists a test \Wn} of size a such that 

(61) lim sup L{Wn, Mn) = 5 > 0. 

n-*oo 

Then there exists a subsequence {n'| of the sequence {n} and a sequence {^n'} 
of parameter values > Bo such that 

(62) lim {P{Wn^ I Bnd - P{Mn' 1 Bn >) } = b 

n“oo 

The expression 

(63) {Bn' — ^o) Vn = Mn' > 0 

must be bounded. This can be proved as follows: Since under the assumption 
^ = ^0 the distribution of \/n (^n — Bo) converges to a normal distribution with 
zero mean and finite variance, the sequence {An} must be bounded. Hence Mn 
is defined by the inequality 


(64) 


Bq A n/ '\/n — €| 
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where 

( 66 ) 


lim Cn = 0. 


From Assumption 1, (64) and (65) it follows easily that if 

lim d„' == 0i> do, lim P{Mn' | Bn') = 1. 


Hence on account of (62) we must have 

(66) lim Bn> = Bo . 

If there would exist a subsequence fn*) of (n'j such that lim jUn. = », then 

n"*oo 

on account of (66) and Proposition 1 we would have lim P{Mn* | Bn*) = 1, 

na«W 

which is in contradiction to (62). Hence the expression (63) must be bounded. 
Let {n"l be a subsequence of [nf\ such that 

(67) lim /In" = /I > 0. 

n—oo 

Denote by Fn{z) the probability of the intersection of Wn and the region 
{K — BQ)\/n < z under the hypothesis that 6 == Bo. Consider the subse- 
quence {n'"} of the sequence {n"} such that Fn^»>{z) converges with n oo 
towards a function F{z). The existence of such a subsequence {n'"} can be 
proved as follows: Denote the probability P[(dn — Bo)y/n < z | ^o) by ^n(«). 
On account of Proposition 1, 4>n(2) converges with n — ► oo uniformly in z towards 


( 68 ) 


m - ‘ /■ 

V27rc 




dt 


where c has the same value in (23). 

We obviously liave 

(69) FniZi) - Fn{Zi) < ^n{Z2) ““ ^n{Zl) 
for any pair of values Zi , 22 for which z^ > Zi . Hence 

(70) lim sup [Fn(z 2 ) - Fn{zi)] < ^( 22 ) ~ ^( 21 ). 

n-*oo 

Since Fn{z) is a monotonic function of 2 , our statement follows easily from (70) 
and the fact that ^( 2 ) is uniformly continuous. Hence on account of Proposi- 
tion 2 we have 


(71) 
and 

(72) 


lim P(W„-H ^n'") = f 

n-*oo 

limP(M n'" I Bn**') ” f 

n*-K> •/— 00 






dF(z) 


d^(z) 
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where 

(73) $(*) = 0 for 2 < 26 , 

(74) $(z) * ^(z) — ^(zo) for z > Z6 , 
and Zo is given by 

(75) 1 — ^(zo) = a. 

Prom (62), (71) and (72) we get 

(76) r d[F{z) - $(z)] = « > 0. 

J—oo 

Consider a normally distributed variate y with mean p and variance c. Jjet B 
be a critical region of size a for testing the hypothesis v = 0 by a single observation 
on y, i.e. jB is a subset of the real axis [“ «>, + «)]. Denote by D(v) the inter- 
section of B and the region C(v) defined by the inequality y < v. Denote by 
H{v) the probability of D{v) under the hypothesis v = 0. Then the power of 
the test B with respect to the alternative v = /x is given by the following ex- 
pression 

(77) f* dH(t>). 

•Loo 

If the region B is given by the inequality y > Vo where Vq is chosen such that the 
size of B is equal to a, then H(v) = ^(v) where the function ^ is defined by the 
equations (73), (74) and (75). Since the latter test is uniformly most powerful* 
with respect to all alternatives y > 0, for any positive n the inequality 

(78) f dlH(v) - 4>(r)] g 0 

•Leo 

holds. Let 




-L. f 

\/ 2irc X-» 




dt. 


It is obvious that 

(79) H(v 2) — H{vi) < — ^(ri) for V2 > Vi 


and 

(80) f dH{v) = a. 

J—co 


* See for instance J. Neyman and E. S. Pearson, “Contributions to the theory of testing 
statistical hypotheses,” Stat. Res. Memoirs, Vol. 1 (1936). 
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On the other hand, if iC(t>) is a monotonically non-decreasing non-negative func- 
tion of V such that 

(79') K(vt) - K(vi) < i(vt) - for Vi> vi 

and 

(80') dK(v) 

hold, then there exists a sequence {5**’}, (i = 1, 2, • « • , ad inf.), of regions of 
size a such that 

lim = K{v) 

•—oo 

uniformly in v. Since (78) holds for H(v) = and since 

- H^%i) < - Hvi) for t;* > t^i , 

it is easy to see that (78) will hold also for H(v) = K{v), Hence for any mono- 
tonically non-decreasing non-negative function K{v) for which (79') and (80') 
are fulfilled, also (78) must hold. Since F{v) is a distribution function which 
satisfies (79') and (80'), we have a contradiction to (76). This proves Theorem 1. 

Theorem 2: Let Mn be the region defined by the inequality y/n (6n — ^o) < -4n , 
where An ie chosen such that P{Mn | ^o) = ot. Then [Mn] is an asymptotically 
most powerful test of the hypothesis 0 — $o , provided that the parameter B is restricted 
to values < Bo . 

Wc omit the proof since it is entirely analogous to that of Theorem 1. 
Theorem 3 : Let Mn be the region consisting of all points which satisfy at least 
one of the inequalities 

y/yi 0n — ^o) < —An f y/n {K — ^o) > An . 

The constant An > 0 ts chosen such that P{Mn | Bo) = a. Then {Mn] is an 
asymptotically most powerful unbiased test of the hypothesis ^ = do • 

Proof: Assume that there exists a sequence {TTn} (n = 1, 2, • • . , ad inf.) 
of regions such that 

(81) P{Wn I do) = a 

(82) lim g{W^ = a 

n—oo 

and 

(83) lim sup L{Wk, Mk) = 6 > 0. 

n-*oo 

We shall deduce a contradiction from this assumption. On account of (83) 
there exists a subsequence {n'} of {n) such that 

(84) lim {P{Wn^ I dnO - PiMn^ I dnO} = 
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The expression 

(86) (Bn' - eo)VW' = 

must be bounded. The proof of this statement is omitted, since it is analogous 
to the proof of the similar statement about (63). Hence there exists a subse- 
quence {n"} of {n'j such that 

(86) lim/in" = M- 

n-»oo 

Denote by Fn(z) the probability of the intersection of Wn with the region 
{6n — Oo)\/n < z under the hypothesis 6 ^ So . Consider a subsequence {n'"} 
of {n"} such that Fn’'>{z) converges with n — > » towards a function F{z). 
The existence of such a sequence jw'") can be proved in the same way as the 
similar statement in the proof of Theorem 1. Hence on account of Proposition 2 
and (86) we have 


(87) 

Urn P(Wn'» 1 

1«„'") = r dF(z) 

and 

n— 00 

J_oo 

(88) 

lim P(ilf„.-. 1 

= f ” d«l>(z) 

where 

n—«o 

X-oe 

(89) 


f dl for 2 < — Zo , 

— 00 

(90) 

4>(z) = #(-2o) 

for — Zo < 2 < Zo 

(91) 

f>(z) = $(— 2o) H yL= f e for z> 

V 2irc •'*0 


and 


(92) 4>(~Zo) = ha. 

From (84), (87) and (88) it follows that 

(93) r d[Fiz) - Hz)] «. 

J—co 

Consider a normally distributed variate y with means v and variance c. Let B 
an unbiased critical region of size a for testing the hypothesis v = 0 by a single 
observation on y, i.e. 5 is a subset of the real axis [— «?, +oo]. Denote by 
D{v) the intersection of B with the region C{v) defined by the inequality y < v. 
Denote by H{v) the probability of D{v) under the hypothesis v = 0. Then 
the power of the test B with respect to the alternative i' = ju is given by 
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If the region B consists of all points which satisfy at least one of the inequalities 
y < — Vo , y > Do , and if Wo > 0 is chosen such that the size of JB is equal to a, 
then H{v) = #(«), where #(«) is defined by the equations (89)-(92). Since the 
latter test is a uniformly most powerful unbiased test,® for any ti the inequality 


(95) 





d\H{v) - «>(«;)] < 0 


holds. Let 


It is obvious that 


^(t>) = f 

V 2irc •*-“ 


(96) H{v 2 ) — Hivi) < ipivi) — for Vi> Vi, 

(97) f dHiv) = a 

J— 00 


and 

(98) 



c dHiv) has a minimum for /i 


= 0 , 


On the other hand, if K(v) is a monotonically non-decreasing non-negative func- 
tion of V such that 


(96') K{v 2 ) — K{vi) < ^(tis) — f{vi) for Vi > vi , 

(97') [ dK(v) = a , 

* 1-00 

(98') f dK(v) has a minimum for ju = 0, 

J_oo 

theii there exists a sequence (i = 1, 2, • • • , ad inf.) of unbiased regions 

of size a such that 

lim //''^(r) = K(v) 

i“oo 

uniformly in v. Since (95) holds for H(v) ~ H^'\v) (i = 1, 2, • • • , ad inf.), 
and since 

//">(«,) - < Hvi) - i^(a,) for a, > , 


it is easy to see that (95) holds also for H{v) = K(v), Hence for any mono- 
tonically non-decreasing non-negative function K(v) for which (96'), (97'), and 
(98') are fulfilled, also (95) must be fulfilled if we substitute K(v) for H{v). 


• J. Neyman and E. S. Pearson, 1. c., p. 29. 
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Since F{v) is a distribution function which satisfies (96'), (97') and (98'), we 
have a contradiction to (93). This proves Theorem 3. 


6. Appendix. Proof of the uniform consistency of 6n . It will be shown here 
that under certain conditions on the density function /(x, 6), Assumption 1, 
i.e, uniform consistency of &n , can be proved. 

For any open subset w of the ^-axis we denote by <p(z, to) the least upper 

bound, and by ^(x, w) the greatest lower bound of -- with respect 


to 6 in the set w. For any function X(x) we denote by X(x) the expected value 
of X(x) under the assumption that 6 is the true value of the parameter, i.e. 



X(x)/(i, 0) 


dx. 


Denote furthermore by P{dn € w | 0) the probability that dn will fall in under 
the assumption that 6 is the true value of the parameter. Finally denote by 
the parameter space and assume that Q is either the whole real axis or a sub- 
set of it. 

Proposition 3. dn is a uniformly consistent estimate of B, i.e. for any 'positive k 
lim P( — k <dn--B<k\$) = l 

n-*oo 

uniformly for all 0 in 12, if the following two conditions are fulfilled: 

Condition I. For all values 0 in 12 




Condition II. For any value 6 in 12 there exists an open interval co(^) containing ^ 
and havi'ng the following three properties: 


Ila. 


lim P{dn€(jj(6) I 0] = 1 


uniformly for all 0 in 12. 

Ilb. E$ip^[Xy a>(^)] is a bounded function of $ in 12, and the least upper bound A of 
Ee<p[Xf co(^)] with respect (p 6 inU is negative. 

lie. E 94 '[x, a)(0)] is a bounded function of 6 in the set 12. 

Condition I means simply that we may differentiate under the integral sign. 
In fact 

f fix, 9) = 1 

J—oo 

identically in B. Hence 


Differentiating under the integral sign, we obtain Condition I. 
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In case that is the whole axis Condition II. reduces to the condition 
that 6n exists. 

In order to prove Proposition 3, we show first that for any positive ij 


(99) 


uniformly for all ^ in 12, We have on account of Condition I 
( 100 ) E, ^ = E, ! fix, B) = £ dx = 0. 


Since 


m [^/ /<*• »)] = ~^/ *> - 4 


we have on account of Condition I 

log/(i 


( 101 ) 


E, (- 


i^y= - 


E, 


d^\ogfix,0) 


de / dfi 

According to Condition II w(0)] < 0 and is a bounded function of $• 

Since E$ < 0 and > E$tl/[x, <o(^)], the left hand side of (101), i.e. 

du* 

the variance of - , is a bounded function of d. From this and the 

do 

equation (100) we obtain easily (99). Consider the Taylor expansion 

( 102 ) 


1 22 ^loK/ (a:., 0) = (0 - d„) 1 22 

71 a SO 71 ct SS^ 


where o'n lies in the interval [fl, d„]. Let « be an arbitrary positive number and 
denote by Qn{0) the region defined by the inequality 


(103) 

On account of (99) we have 

(104) 


1 V ^ ^OgfiXa,e) ^ ^ 
n ct 


lira P[Qn(e) I 0] = 1 


uniformly for all ^ in 0. 

Denote by Rnifi) the region defined by the inequality 


(105) 


- Z , «(«)) < iA < 0. 

71 a 


On account of Condition Ilb 
(106) 


Urn PlRn(0) 1 <»] = 1 
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uniformly for all 6 in C. Denote by B„{d) the region in which €u(0). 
in Bn{e) 




n 


Since 


we have in the intersection R'n{0) of Rn{d) and Bn(d) 


(107) 



log/(x„ 

ae^ 


el) 


> 


Denote by Un(0) the intersection of Qn{0) and R'n(0). It is obvious that 


(108) 


HraPK7„(0)|e] = 1 


uniformly for all 0 in i2. From (102), (103) and (107) we get that in Vn{0) 
(109) 

Hence on account of (108) 

Iimr(l9-«,1<|5(J«)-1 

uniformly for all 0 in U. Since e can ho chosen arbitrarily, Proposition 3 is 
proved. 

Conditions I and II are sufficient but not necessary for the uniform con- 
sistency of . For sufficiently small w(^) the conditions Ilb and Ho are rather 
weak. In fact, on account of (101) we have 

„ a' log /(a-, e) 

— a(P 


Hence for sufficiently small intervals ai(^), under certain continuity conditions, 
also Ee(p[Xj w(^)] will be negative. However, in some cases may be difficult to 
verify Ila for small u){6). On the other hand, for sufficiently large o)(6) (cer- 
tainly for 03(6) = [-“ 00 , + 00 ]) IH can easily be verified, but the conditions Ilb 
and IIo might be unnecessarily strong. In cases where Ilb or lie does not hold 
for o}(d) = +*»] and the validity of II is not apparent, the following 

Lemma may be useful: 

Lemma: Proposition 3 remains valid if we substitute for Condition II the con- 
ditions 

II'. Denote by Tn the set of all points at which exists and 

(110) S ^ log fiXa ,e*) = 0 

has at most one solution in 6*. Then lim P[Tn 1 ^] = 1 uniformly for all 6 in 12, and 

n-»oo 

II". There exists a positive k such that for 03 ( 0 ) = 7(0) = (0 — A;, 0 + A;) the 
following two conditions hold: 
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Ilb ■ 7(0)] i$ a hounded function of d in it and the leaat upper bound A 

of Etip[x, 7(0)] with reaped to 0 in Q is negative, 

II". E$i(/[x, 7(0)] ia a bounded function of 6 in the ael 0,. In cases where lib 
or He is not fulfilled for «(0) = [— oo , + » ] the verification of II' and II" may 
be easier than that of II. 

Our Lemma can be proved as follows: Consider the Taylor expansion 

(111) log/(x., fl*) = ^ 2 4 log/(x„, 0) + (0* - 0) i log/(a:., 0') 

n ad n oO n 

where 6' lies in [^, $*], Denote by Vn{B) the region defined by 

(112) is^la:«,7(0)] < iA < 0. 

n 

On account of lib we have 

(113) \imP[Vn(d)\e] = 1 

n—oo 

uniformly for all 6 in 0. Let Wn(B) be the region defined by 

(114) log/(a:.,0) <«. 

From Condition I and Condition II" it follows easily that 

(115) lim P[TF„(0) 1 0] = 1 

n»oo 

uniformly for all ^ in 12. For all values $* in the interval 1(0) we have 

(116) - S«>[x„ , 7(0)] > 1 2 ^ log f{Xa , 0'). 

n n 

Because of (112) and (116) we have in F«(0) 

(117) - 2 ^ log /(x, , 0') < iA < 0 

for all values 0* in the interval 1(0), Let € be less than | \kA |. Then in the 
intersection Wn(0) of the regions Vn(0) and Wn(0) we obviously have on account 
of (114) that the values of the left hand side of (111) for ^ + fc and = 
0 — k will be of opposite sign. Hence at any point of Wn(0) the equation (110) 
has at least one root which lies in the interval 1(0), Since (110) has at most 
one root in Tn and since 0n is a root of (110), we get that at any point of the 
intersection Wn(0) of Wn(0) and Tn , K lies in 1(0), Since 

(118) lim P[W'\0) 1 = 1 uniformly for all 0 in 12, 

n-»oo 

also 

(119) limPld.«7(0)l0] = 1 

n«»oo 

uniformly for all 0 in C. The relation (119) combined with the conditions lib 
and II” is equivalent to Condition II. Hence our Lemma is proved. 



EXPERIMENTAL DETERMINATION OF THE MAXIMUM OF A 

FUNCTION' 

By EL\rold Hotelling 
Columbia University, New York City 

1. The necessary background for efficient experimental determinations. We 

shall deal with the problem of arranging an experiment for determining the 
value of X for which an unknown function f(x) is a maximum or minimum. 
This problem is to be distinguished from those of estimating the maximum or 
minimum itself, and of studying the distributions of such estimates, problems 
to which Bernstein [1] and Rice [2] have contributed. 

The range of applications in which determinations of maximizing and mini- 
mizing values are important is extremely wide. Among these are the deter- 
mination of the time of year at which the number of algae or bacilli in a lake 
is a maximum, and the amount of fertilizers and of irrigation water making the 
jdeld of a crop a maximum. The magnetic permeabilities of permalloys, per- 
minvars and permendurs as functions of the induction, and the hardness of a 
copper-iron alloy as a function of the time of aging at 500®C., possess smooth 
maxima having interest in telephony, [3], [4]. The effective range of a gun is a 
function of the speed of burning of the powder, a variable which can be con- 
trolled. Almost every entrepreneur has a fervent desire to know the selling 
prices that will yield a maximum profit, and a few have undertaken controlled 
experiments with a view to finding out. There are also numerous practical 
problems of minimizing costs; for example, the cost of operating a ship as a 
function of its speed possesses a mmimum. We shall confine our attention 
chiefly to the experimental determination of maxima, since such problems seem 
to occur naturally with greater frequency in applications; there is no loss of 
generality in this, since /(x) has a maximum where —/(x) has a minimum. 

We shall assume that, for each value of x in the set we shall select, one or 
more observations will be made on y == fix), and that these observations are 
afflicted with errors which are independently distributed about zero with a 
common variance <r^. From this it follows that if fix) is a linear function of 
known functions of x, with unknown coefficients Po y i • • • j (for example 
a polynomial in x), the most efficient method of fitting is the method of least 
squares, which yields unbiased estimates bo , • • • , bp of /So , • • • , having the 
least possible variances; this is true whether or not the errors are normally 
distributed. If the fourth moment of the errors is finite, and if the number N 

^ Presented at the joint meeting of the Institute of Mathematical Statistics and the 
American Mathematical Society at Hanover, September 10, 1940. 

20 



MAXIMUM OF A FUNCTION 


21 


of observations is large, the estimated coefficientB will be distributed in an 
approximately normal manner; and so also will any function of them that is 
regular in a fixed neighborhood of its “population value.” By the “population 
value” of a function ^( 60 , • • • , 6 p) we mean ^(/3b , • • • , | 8 p). In particular, if 

f(x) = | 8 b + ft® + ft** • • • dp*** 

has a maximum for x = ( of the simplest type, such that/'({) 0 and/”(() < 0, 

so that ( is a simple root of the equation 

f'(^) = ft + 2ft{ + . . . + = 0, 

and if Xo is an estimate of ( found from the polynomial fitted by the method of 
least squares, so that 

bi -f- 2 & 2 X 0 "I" • • • d" pi>p*o * “ 0 , 

this last equation defines xo as a function of bi , • • • , bp . The function is, to 
be sure, multiple-valued when p > 2 ; but for sufficiently large values of N the 
probability will become arbitrarily great that the roots obtained from a random 
experiment will each differ by an arbitrarily small quantity from one of the roots 
of /'(x) = 0. Then provided toe have a sufficient preiiniimary approximate knowl- 
edge of we may choose the root nearest and the probability distribution 
of this root, which in nearly all experiments will be a single-valued function 

<h(bi , • • • , bp), 

will approach normality of form, with standard error of order about a 

mean differing from 

{ = ^(ft , • • • , ^p) 

at most by terms of order N~^, which are thus negligible in comparison with the 
standard error. The situation will be effectively the same if, without knowing | 
in advance even approximately, we choose the root xo giving the greatest value 
/(xo), provided /({) is greater than any other value of /(x). 

From these considerations it appears advisable, whenever the unknown func- 
tion is capable of being represented adequately by a polynomial of degree p 
considerably less than the number N of observations, to fit a polynomial of 
degree p by least squares, and from it to determine the maximizing value by 
differentiation. In practice, however, there are obstacles to carrying out such 
a procedure with confidence. The form of the function is usually not known; 
it is far from clear what value should be given p even if the function is to be 
regarded as a polynomial; the use of a polynomial which does not give a suffi- 
ciently good fit, with observations taken at a considerable distance from the 
TriRxiTniging value, perhaps separated from it by other maxima and •minima, 
appears to be a higUy dubious proceeding; and if p is taken large, the labor of 
calculation becomes excessive. For all these reasons it is desirable to assign 
the values of x which are to be the basis of the experimental work close enough to 
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the maximizing value ( so that a polynomial of very low degree will fit ade- 
quately ini the neighborhood. 

We shall restrict ourselves to functions having continuous derivatives of all 
relevant orders* in a neighborhood of Such a function can in a suflSciently 
small neighborhood be approximated by a polynomial of the second degree. 
The necessity of using a polynomial of higher degree can therefore be avoided, 
when a fairly good knowledge of the function is already in hand^ and when the 
number N of observations that can be made is large enough, by choosing all 
the values of a; in a sufiicieiitly small neighborhood of We shall suppose that 
this is done; that is, a regression equation 

y = 60 biX - 4 “ 

is fitted by least squares to a large number of observations after choosing the 
values of x quite close to the true maximizing value and the estimate Xo of { 
is a solution of dY /dx = 61 + 2 \hx = 0, so that 



We shall examine the errors in Xo arising both from the inadequacy that may 
exist m the quadratic approximation and from the random errors of observation, 
and shall consider what distribution of x may most appropriately be chosen to 
reduce the errors of both kinds, and to place them in a suitable balance with 
each other. 

It will be observed that a fairly definite preliminary knowledge of the function 
under investigation is required for such a program. Any criterion for the selec- 
tion of values of x for experimentation must involve not only the value of { 
but also the values of the first few derivatives in a neighborhood of or some 
similar information. The requirement of preliminary information is essential 
for the efficient design of experiments in general. For instance the efficiency 
of an agricultural field experiment depends on the correctness of the appraisal, 
before the experiment is laid down, of the general nature of the fertility gradients 
likely to exist in the field and of the variances due to error and main effects 
which will be revealed more accurately by the experiment itself. If the pre- 

* Other cases may well arise in practice and deserve separate consideration in connection 
with the particular investigations in which they arise. For example various physical 
properties of alloys, regarded as functions of the proportion of a particular constituent, 
have maxima, but may have discontinuous derivatives because of the phenomena of crys- 
tallization and solution of one metal in another. The assumptions appropriate to an in- 
vestigation, parallel to that of the present paper, of the proper organization of experiments 
for finding such metallurgical maxima must be drawn from metallurgy. The case of con- 
tinuous derivatives is however of widespread importance. If no regularity assumption is 
made about the function, one set of N values of x is as good as another, and no set is likely 
to tell us very much about the function if it is one of the violently irregular ones utilized 
in the theory of functions to emphasize the necessity of studying that subject. 
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liminary information is incorrect, a properly arranged self-contained experiment 
will nevertheless give results which are valid^ in the sense that the significance 
probabilities calculated from them by accurate methods are correct, but will be 
inefficientf in the sense that another experiment of the same cost, based on better 
preliminary information, would be more likely to detect real effects through the 
smallness of such a calculated probability. The efficient conduct of experi- 
mentation thus proceeds in stages of ascending magnitude. A large-scale in- 
vestigation should be preceded by a smaller one designed primarily to obtain 
information for use in designing the large one. The small preliminary investiga- 
tion may well in turn be preceded by a still smaller pre-preliminary investigation, 
and so on,* like an army marchmg after an advance guard, which follows a more 
advanced smaller detachment, which follows a still smaller and still more ad- 
vanced unit, which follows a “point. At the very beginning of the process of 
chain experimentation will stand work based on little or no clear information 
of the kind required for efficient design. This first phase will be speculative 
and exploratory in character. Neither its cost nor its accuracy can well be 
estimated in advance. It is a favorite, but not exclusive, preoccupation of men 
of genius. Many of its results turn out to be worthless. But it is an essential 
preliminary to well-organized research directed to definite aims defined qualita- 
tively in advance. 

After the first speculative and unsystematic phase in the knowledge of a 
subject is past, but before the careful, economical organization of an accurate 
investigation, an intermediate type of exploration is needed to supply estimates 
of the parameters required for the design of the full-scale investigation. In the 
present case such a systematic though small-scale experiment might perhaps 
consist in dividing a range w ithin which the desired maximizing value ( is known 
to lie into equal parts, making at least two observations at each of the ends of 
these intervals, and fitting a polynomial of at least the fifth degree by least 
squares. This will make possible estimates of the parameters <r, jSi , ft , • • • , ft 
(and hence of {) required for using the efficient designs which we shall obtain. 
At least six different values of x arc required for fitting the polynomial of the 
fifth degree. The fitting process is facilitated by taking them in arithmetic 
progression and using orthogonal poljmomials. 


* A remarkable example of such a series of investigations is the chain of sample censuses 
of area of jute in Bengal carried out for the Indian Central Jute Committee under the 
direction of Prof. P. C. Mahalanobis annually beginning in 1937. Each yearns work is 
designed primarily to obtain information for planning the next yearns, and a sequence of 
four or five such investigations, each considerably larger than the preceding, is planned 
to lead up to an eventual annual sampling of the whole immense jute area in the province. 
A partial account of this is given in [5], a fuller one in confidential but printed reports of 
the Indian Central Jute Committee, Calcutta. 

Certain multiple-sample schemes in manufacturing inspection also provide good 
examples of chain experiments, [6). 
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2. Sampling errors and bias in the quadratic approximation. Let us measure 
all values of x from the value f under investigation which makes /(a:) a maximum. 
Then { == 0, and in the expansion 

(1) f{x) = jSo + ftx + + PiX^ + • • • 

we shall have /Si = 0 and /S2 < 0; we shall assume that /S2 < 0. An observation 
Pa corresponding to a chosen value x^ will have, by assumption, an error Aa of 
zero expectation and variance cr®, such that 

(2) Pa = f(Xa) + Aa . 

A quadratic estimate 

( 3 ) F - 60 + M + biX^ 

of f(x) is obtained by means of normal equations which may be written 

Oobo + dibi + aJh == Sp 

(4) aibo + (hbi + 02^2 = Sxp 

Oibo + azbi + aih = Sx^p, 

where S stands for summation over all the observations, so that, for example, 
Sp ^Pa = Pi + Pi + ••• + Pn y and where 

(5) ak = Sx^. 

In particular, oq = N, A determinate solution is possible only if there are at 
least three distinct values of x; we shall always suppose therefore that this is 
the case. This is equivalent to assuming that the determinant a of the coeffi- 
cients in (4) is not zero. A greater number of observations p is necessary to 
obtain an estimate of the variance a^j and furthermore wc shall suppose this 
number large in our approximations, but since repeated observations may be 
made for each value of x, it is not essential that there be more than three values 
of X in the distribution to be selected. 

If we put 

( 6 ) Sbk = bk -- pk , 7 * = Sx^Ay 

for A; = 0, 1, 2, substitute (1) in (2) and the result in (4), and utilize (6) and 

(6) , we obtain 

(Z06&0 *4“ Q^iSbi 4" dzSbz = 7o 4” (^tPi 4“ o> 4 Pi 4" • • * 

( 7 ) aiSbo 4 ” a^Sbi 4 “ = 7i + CLtPz + clzPa 4 “ • • • 

(IzSbo 4 ” CLzdbi 4 “ CLidbz = 72 4 “ CLbPi 4 “ 4 “ • • • 

From these equations it follows that the errors dbk are homogeneous linear func- 
tions of the right-hand members and will therefore be small if the quantities on 
the right are small. Of these quantities, the 7fc^s will be stochastically of the 
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order for large samplea with any fixed aet of valuea of x. When the equa- 
tiona are aolved, their coefficienta will be of the order of ao that the product 
ia of order and becomea negligible if JV ia large enough. The coefficienta 
akoffftifii, • • • canbekeptamallifthevaluesofzarechoBentoliewithinaBufficientiy 
reatricted range. Of courae the coefficienta Ok in the left membera of (7) will 
alao be amall in this case, but not small enough to offset fully the smallness of 
those on the right. To see this, we observe that if all the valuea of z be multi- 
plied by any quantity g, a* is multiplied by ff*, while 




Oo 

at 

Otl 

(8) 

a =* 

ai 

Of 

ail 



(h 

as 

Oil 


is multiplied by g*. The cofactors of the last column are proportional respeo- 
tively to g*, g* and g^. Hence, in the expression for Sbt , the coefficient of fiz is 
of order g, that of 184 is of order g*, and so on, the coefficients of the /J’s of higher 
orders vanishing more and more rapidly with gr as we go on in the sequence. 
The like is true of 55i and 6bz , which vanish even more rapidly with g. Thus 
we may, by restricting sufficiently the range of x on the basis of the assumed 
preliminary knowledge of the function, and taking a sufficiently large sample 
of observations, bring it about that the probability will be arbitrarily close to 
unity that the ibk’s are less than any assigned limits. 

Let us, in particular, restrict the range sufficiently and take a large enough 
sample to make it reasonable to regard Sbt as negligible in comparison with /9t . 
The error in the estimate 


(9) 


Xfi = 


h. 

2bt 


of the maximizing value { will, since we are taking { = 0 , be Zo itself, and may 
be written 


5zo = — 


Sbi 

2(jS* -|- ibz) 



where the terms other than 1 in the last parentheses are negligible. The problem 
of minimizing the error Sxo is then virtually equivalent to minimizing the error 
6bi . In section 5 it will be shown that it is not until we reach terms of the 
order of g^ that the errors dbz need be taken into account. We shall first discuss 
the errors in Zo of lower orders in g, and thus confine the discussion to Sbi . For 
the present we shall take as the quantity to be made as small as possible the 
expectation of the square of this last error, E{dbi)^. This is not the same as the 
variance of h , since Edbi is not in general zero. We have, in fact, by trans- 
posing a familiar formula for the variance. 


( 10 ) 


EiSbi)^ = {ESb,y -I- cl , 
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thus dividing our minimand into two parts, due respectively to the bias arising 
from the neglect of terms of third and higher orders, and to the usual sampling 
errors. 

By the usual least-square theory, the sampling variance of bi is 

(11) crj, = , 

where m is the cofactor of the central element in a, divided by a, that is, 

(12) M = (0004 ~ o|)/o. 

Since m is of the order of we may reduce the sampling variance as much as 
we please by taking the values of x sufficiently far removed from If f(x) is 
definitely known to be only of the second degree, a wide dispersion of the desir- 
able values of x is thus indicated, since in this case E6bi = 0, as appears by tak- 
ing the expectation of each term in (7). But if, as will usually be the case, 
f(x) has terms of higher orders than the second, an excessively wide dispersion 
may increase the bias E6bi to such an extent as to render the quadratic approxi- 
mation inapplicable. 

In taking the expectation of each term of (7) and then solving for Edbi we 
obtain, since Eyu — 0 according to the definition of yk , and because EA = 0, 
a result of the form 

(13) E8bi = Ba/Ss + BiPi -f- + • • • . 

We shall call Bs , B 4 , and Be respectively the cubic, quartic and quintic com- 
ponents of the bias, or simply biases. If we denote by X, /i, v, the ratios to a 
of the cofactors of the second column of a, so that 

(14) Xfli -j- fA(i2 *+• vdi = 1, 
we shall have for the components of bias, 

Bs = Xos “f* MG 4 4“ 

(15) B 4 = Xa 4 + MGft “h 

Bs = Xas •+• fiCL^ -j- va7 , 

and so forth. Since X, /x, and v are of respective orders —1, —2 and —3 in a 
multiplier g of all the values of x, Ba is of order 2, B 4 is of order 3, and the higher 
biases are of higher orders. Thus if we begin with any particular distribution 
of x and apply a sufficiently small multiplier gf, we can make the quartic bias 
negligible in comparison with the cubic, the quintic in comparison with the 
quartic, and so forth, provided none of these biases is zero. But in reducing 
g we increase the sampling variance, which is of the order of gr”*. 

Under these conditions it is reasonable to consider what types of distribution 
having a fixed value of the sampling variance make the cubic bias a minimum 
in absolute value; then if there is more than one distribution of this kind, to 
seek among them a class minimizing the absolute value of the total of cubic and 
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quartic biases; and among these a class minimizing the absolute: value of the 
total of cubic, quartic and quintic biases, with the modified meaning of the 
quintic bias taking account of 8bt . 


3. The cubic and quartic biases. We find, somewhat unexpectedly, that 
there exists a class of distributions of x for which the cubic bias is actually zero. 
To exemplify this we need give the variable no more than three different values, 
which we may call x, y and «, and we may assign to them the arbitrary fre- 
quencies k, m, n of experiments (k + m + n — N). If we put 


1 1 1 

(16) P - X y 2 =(x- y){y - z){z - x), 

2 2 2 

ic y z 

and consider a matrix of three rows and N columns, of which k columns are 
identical with the first column of P, m with the second, and n with the third, 
it is evident that the sum of the squares of the three-rowed determinants in 
this matrix is kmnP^. But this sum of squares is also equal to the determinant 
formed from the sums of products of the three rows, and this is a (formula (8)). 
Thus a = kmnP^ ^ 0, since x, y, z are all different. Together with the fore- 
going 3 X N matrix consider another. 


(17) 


1 1 


having k columns identical with that first written, m identical with the second 
written, and n identical with the third. The only non-vanishing three-rowed 
determinants in this matrix are formed of these three different columns, and 
equal {zy + yz + zz)P; there are kmn of them. The sum of products of cor- 
responding three-rowed determinants in the two matrices is therefore 
kmnP^ixy + yz + zx). But this sum is also equal to the determinant, formed 
from the sums of products of corresponding rows. 


00 (h (^9 
ai az Ui 

01 di dz 

which, by (15), equals —aBt . It follows that 

(18) -Bz = xy + yz + zx. 
There are many real solutions of the equation 

(19) xy + yz + zx <=> 0, 
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with the three values all different, for example — 2, 3, 6. If we assign such values 
to our variable, and an arbitrary number of experimental determinations to each 
of these values, the cubic bias B% will be zero. 

It will be noticed that such a solution cannot have zero for one of the values. 
If, for example, z = 0 in (19), then xov y must also vanish, in violation of the 
condition that there must be at least three distinct values. Moreover a solu- 
tion cannot be symmetrical about zero; if x -I- y = 0 it follows from (19) that 
X = y = 0. A solution may or may not be symmetrical about a value other 
than zero. The values 3 — 2 \/3, (3 — \/3)/2, v^3 satisfy the equation and 
are in arithmetic progression, while the solution —2, 3, 6 is asymmetrical. 

If we modify (17) by replacing the cubes of the variables by their fourth 
powers, and apply the same procedure to the modified matrix, we find that 

( 20 ) B 4 = -(x + y){y + z){z + x). 

Thus there exist sets of three distinct real values making the quartic bias vanish, 
for example any set for which x + y = 0; but no such set can at the same time 
nullify the cubic bias (18). Since it is ordinarily more important for the cubic 
than for the quartic bias to vanish, distributions nullifying (20) are not in 
general to be recommended. But in exceptional cases it may be known that 
jSs is zero, or very small in comparison with ft , and then the vanishing of 
is a more valuable property than that of Ba . It will be shown that no distribu- 
tion of three or more values exists such that both the cubic and quartic com- 
ponents of bias are zero. 

Let us denote by Dp the p-rowed determinant having ai^j ^2 as the element 
in its zth row and jth column. Thus Ds is the same determinant which we have 
in (8) called a, and 


( 21 ) 


Oo 

ai 

Os 

as 

ai 

as 

as 

a4 

as 

as 

a4 

as 

as 

a4 

as 

as 


For every distribution, every Dp > 0; and a necessary and sufficient condition 
that a distribution have p or more distinct values is that Dp be greater than 
zero. [7, p. 362]. If Dp is positive, so is each of its principal minors. In 
particular, since we are requiring at least three values in a distribution, Dz == 
a > 0, and therefore 

0204 — al > 0, 


( 22 ) 

and 

(23) 


00^4 a* > 0. 
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We shall now consider distributions for which the cubic bias Bt is zero, and 
consequently, by (15), 

(24) Xaa + ^04 + roa « 0, 

and expand D 4 , . From the definition of X, n, v, we have 

(25) Xo* + ndz *4“ vOi = 0. 

Multiply the last row of the determinant (21) by v, and add to it X times the 
second row and m times the third. The last row is thus, by (14), (25), (24) 
and (15) transformed into 

1 0 0 B 4 , 

while the determinant has been multiplied by v. Let this new determinant be 
expanded with respect to its last row. The cofactor of the first element 1 is 


Ui 

Os 

Os 

il 

1 

Us 

04 

Ua 

04 

Oft 


Let the last row of this determinant be multiplied by v, an operation having 
the effect of multiplying the whole determinant by v\ and let X times the first 
row and times the second row then be added to the last. The last row is 
thus, by (14), (25) and (24) reduced to 

1 0 0 . 


Hence 

vG = —(0204 — a\), 

and consequently 

= v{aBA + GO 

(26) . _ 

= VCtBz — (OtU4 — Us)* 

Since the first member of this equation is positive or zero, (22) shows that it is 
impossible that B4 should equal zero when Bs = 0 as we have assumed. That is, 
Either the cubic or the quartic bias of every distribution having three or more distinct 
values must be different from zero. 

If V were zero, (26) would contradict (22). Hence v 9 ^ 0, With every dis- 
tribution of X there is associated another obtained from it by changing the sign 
of each value of x. Such a pair of distributions we shall call opposite. When 
we pass from a distribution to its opposite, the power-sums o* remain un- 
changed when k is even and change only in sign when k is odd. Since a is 
always positive, and since 

(27) 


V = (aia2 — <Wz)/a, 



30 


HAROLD HOTELLING 


V has opposite signs and the same absolute value for opposite distributions. 
The conclusions to be reached shortly will be equally valid for a distribution 
and its opposite, and in reaching them we may assume p > 0. It will then 
follow from (22) and (26) that Bi > 0. 

4. Distributions nullifying cubic bias with minimum quartic bias. We can 

now prove the following theorem: 

Among distributions for which the cubic bias vanishes and the standard error of 
bi has a fixed value, those for which th£ quartic bias is a minimum have exactly 
three distinct values of the variable. These values satisfy the equation 

(28) xy + yz + zx 0. 

Since the standard error <r of a single observation is not affected by the dis- 
tribution chosen for x, fixation of the standard error of hi is equivalent by (11) 
to fixation of the value of the expression given by (12). We suppose therefore 
that fjL has some fixed positive value and that Bz = 0. Since fx, Bz and Bi do 
not involve the distribution of x excepting through the power-sums Oo , Ui , • • • , 
ae , we may treat these power-sums as the independent variables in trying to 
make Bi a minimum. Their region of variation is limited by the inequalities 
referred to in the preceding section, 

Dx = oo > 0 , Da > 0 , Da = o > 0 , D 4 > 0 . 

The inequalities Dp > 0 for p > 4 involve power-sums of orders higher than 
the sixth and arc irrelevant to our purpose. 

The definition (8) of a shows that it is independent of az and Oe ; consequently 
X, )u, and V are also. According to (15), Bz involves az but not oe ; while of all 
the expressions we have considered, only Bi and D4 are functions of ae . There- 
fore when Oo , ai , • • • , as are given any definite values, Oe may be chosen to 
make Bi a minimum without any regard to the fixed values of n and Bz . Now 
(16) shows that Bi is a linear function of oe with a coefiicient which, at the end 
of the last section, we have proved not to be zero and assumed positive. Thus 
Bi , which is also positive, is an increasing function of oz . Its minimum will 
correspond to the least value of oz consistent with the condition D4 > 0. But 
(21) shows that D4 is also a positive linear function of Oe with a positive coeflS- 
cient, a. The minimum of Oz , and therefore that of Bi , require therefore 
that Di = 0. But D4 = 0 is exactly the condition that there should be no more 
than three distinct values in the distribution. Since there must be at least 
three distinct values, and since if there are only three they must satisfy (19), 
the theorem is proved. 

The minimum value of Bi with respect to variations of oz when Ds = 0 may 
be found by putting D4 = 0 in (26). Designating this minimum by b and 
using (27) we have 

2 

0204 — Of 
Ox O 2 Oo Oj * 


( 29 ) 
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where the numerator is intrinsically positive, and the denominator is positive 
for the class of distributions we are now considering, though we might equally 
well consider the opposite distributions, for which it is negative. We have also 
from (20), 

(30) (* + y){y + *)(* + *) =* — b. 

Substituting for each of these binomials its value as given by (28), we may write 
this in the simpler form 

(31) xyz = 6 > 0. 

It was shown at the beginning of section 3 that when there are only three 
values in the distribution, with frequencies k for x, m for y, and n for z, 

(32) a = kmnP^ = kmn(x — y)*(y — «)*(« — *)*. 

The first two rows of (17) form a matrix such that the sum of the squares of 
its two-rowed determinants is 

(33) mn(y* - z’)* + nk{z^ - ®*)* + km{x^ - y*)*. 

Since this is equal to the determinant of the sums of products of the rows, 
namely 

CLq 0^ 

Of ai 

it follows from (12), (32) and (33) that 

(34) (y + ^y I + + 

k{x — y^iz — zY m(x — yY{y — zY ^ n[x — zY{y — zY' 

It is desired to minimize this expression, which is the factor of the variance 
that is independent of the accuracy of the individual observations, while hold- 
ing h = zyz fixed; or to minimize h while holding fi fixed. In either case the 
values of x, y and z are to be chosen to satisfy (28). The relations established 
by the solution of either of these virtually equivalent problems will fix x, y, and z 
except for a factor of proportionality, which must then be adjusted to provide 
a balance as satisfactory as possible between random errors and bias. 

6. The quintic bias. Effect of 61 h . With aily distribution determined in 
this way will be associated its opposite distribution, which will have the same 
minimizing properties so far as the variance and the cubic and quartic com- 
ponents of bias are concerned. The appropriate choice between these two op- 
posite distributions will in general involve the quintic component of the bias. 
At this point we must, for the first time, take account of the errors in the de- 
nominator bs of xo . 

Since bi converges stochastically to Ebi , and 6a to Eb% , the error xo ~ — i6i/6f 
converges stochastically (for large samples) to — \Ebi/E\H . By keeping our 
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values of x close enough to f we may insure that Eh% differs as little as we please 
from ft , and hence that the series 

Ebi ^ Eb\ ^ Eh\ Ehhi , (JS/662)* 

Eh% ft + Ebh% ft \ ft j 9 | 

converges rapidly. Let us rearrange this series after inserting for Ehi and EBJh 
their values, so as to obtain a series in ascending powers of a common multi- 
plier g which may be applied to the values of x. We recall that in the expression 
(13) for Ehi , Bz is of the second order in g, Bi is of the third order, Bz is of the 
fourth order, and so forth. In the same way, we find that 


where 


Edbi = Caft + CiPi + * • • , 


I Oo Cl as I 

Cs = - ai Ci Ci 

a 

I Os 03 Ot I 

is of the first order, Ci is of the second order, and so forth. Thus in 
= Biff, + (B 4 /S 4 - BsCs/Sl/iS,) 

+ (Bzph BiCzPzPi/P2 ■“ BzC\fizPi/^2 + BzC\fi\/fi\) “ 4 ” • • • ) 

the first term is of the second order, those in the first parentheses are of the 
third order, those in the second parentheses are of fourth order, and the re- 
maining terms are of higher orders. 

We have seen that we can choose distributions for which Bz = 0. In this 
way we get rid of the second-order term and reduce the third-order terms to 
Bifiz . We shall in the next two sections show how, under various conditions, 
to select from among the distributions for which Bs = 0 an opposite pair for 
each of which | B4 1 is a minimum. In choosing between these two opposite 
distributions, the criterion we shall adopt is that the terms of third order and 
those of fourth order shall have opposite signs; for while the fourth-order terms 
may be made much smaller than those of third order in absolute value, still it 
is desirable that they should offset them, in order to reduce the error. The 
terms of third and of fourth orders reduce respectively for Ba = 0 to B^i and 
to Bzfih — BiCzfitPjfii . Our criterion is that these are to have opposite signs, 
and consequently that 

B4Pzfii{BzP%fiz “ BiCaftft) < 0. 

We shall however modify this criterion whenever a is not negligibly small. 
A more precise criterion will be obtained by expanding xo^ in a series of powers 
of ^62, taking the expectation term by term, and reducing the moments thus 
obtained of orders higher than the second to those of first and second orders by 
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means of the theory of the bivariate normal distribution of bi and bt. It is 
then necessary to make some assumption regarding the order of magnitude of 
X, y and z relatively to AT in order to assemble terms of like magnitude in a 
criterion resembling that above but involving v. The appropriate balance in- 
dicated by the results of the next two sections calls for x, y and z to be of the 
order of This leads to the following criterion; 

~ C'ajSiMV*) < 0. 

We have seen that B 4 = 6 = xyz. To evaluate C» and 5» , which latter 
may in accordance with (15) be written 

CLO <l2 CLb 

Bi = — — ai Os 06 
a 

02 O4 07 

we proceed as in section 3, replacing the second row of (17) by the first 
powers to obtain Ct , and replacing the third row of (17) by the fifth powers 
of X, y and z to obtain Bi . In this way we find 


1 

1 

1 

1 

1 

1 
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y 

« , sr — 1 
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Letting Zx, Zx^yz, etc. stand for the symmetric functions of x, y and z of which 
one term is written in each case after 2, we may reduce these expressions to 

Cl ~ Zx, 

Bi = —Zx^y — 2xy — 2Zx^yz. 

With the help of (28) and (31) we find 

Zx^yz — xyzZx = bZx, 

Zx^y^ = (Zxy)* — 2Zx^yz »= — 262x, 

Zx*y — ZxyZx^ — Zx^yz = —bZx. 

Therefore Bi = 62i. Substituting these values for Bi , Ct and Bi in the 
last inequality gives the rule: 

Choose that one of a pair of opposite distributions for which 

(35) ix + y + z)|8,{b*/3403i/35 - fiipi) - /3. < 0. 

It will be remembered that fit is negative for a maximum of f{x), positive for a 
mi n imum . The other /S’s can only be estimated from preliminary experimen- 
tation, or possibly in particular cases from general knowledge or theory. 
Quite different algebraic methods are appropriate to minimizing n with a fixed 
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b according to the limitations to be placed on the frequencies k, m, n; the meth- 
ods leading very simply to a solution in one case involve troublesome complica- 
tions in another. We shall deal with two of the leading cases. 


6. The case of equal frequencies. Some experimental situations call for equal 
frequencies for all values of the variable. If A: =» m =n, then a© = JV = 3n. 
Let o,- = o,/n. Then oi = 3 and a'l — Sx. Inasmuch as 

(36) 2xy = 0 and xyz = b, 

we may express (h , Os and as functions of a'l and b as follows: 

at = Sx* = (Sx)* — 2Sxy = o(*. 

a» = Sx* = (Sx)* - 3Sx*y - 6xyz; 

and since Sx*y = SxSxy — Sxyz we have from (36), 

a* = di* -b 36. 

We have also 

a'i = Sx* = (Sx)* - 4Sx*y - 6Sx*y* - 12Sx*j/2, 

and since 

Sx’y = SxySx* — Sx*y 2 , Sx*y 2 = xyz2x = a(6, 

Sx*y* = (2x2/)* ~ 2Sx*2/2 = — 2o(6, 

it follows that 
Therefore 


a = n 


Upon subtracting a[ times the second column from the third, and a[ times the 
first from the second, this becomes 


/ 

a4 

= a't* + ia'ib. 

3 

/ 

ai 

Oi 

/ 

ai 

^'2 

ai 

Oi* "h 36 

ai 

a? + 36 

a'l + Aa'ib 


a — n*b 


Also, 

0004 ~ 

Hence, by (12), 
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-2o( 
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ax 
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= -n^biW + 276), 

di 
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‘ + 4oi 

16) 

- (o;*)*} = 2n*(a;* -4- 
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Differentiating with respect to a[ to find a minimum^ we obtain 

0 = (4o,'* + 27b)(4a(‘ + 66) - 12ai\ai* + 6c(5) = 4o(* + 60ol* + 1626*. 

The minimum of n, for 6 fixed, and satisf 3 dng the condition 4ai* + 276 < 0, 
which is equivalent to a > 0 since we assume 6 > 0, is attained when a* » hq 
where q is the numerically greater root of the equation 4g* + 60? + 162 = 0; 
that is, 

g = -(15 + V63/2) “ -11.468 626 97. 

The elementary symmetric functions of the values x, y, z composing the dis- 
tribution are 

Sx = o( = (6g)*^', Sxy = 0, xyz — 6. 

Hence x, y and z must be the roots of the equation in u, 

(38) u* - (6g)‘V -6 = 0. 

If we put u = (bqy'*v, 

= 0. 

Calculation gives approximately 

g“‘ = — .087 194 396, and for the roots of the equation in v, 

(39) 

.2628, -.3729, -.8899, 

numbers which are therefore proportional to the values of the variable that 
should be chosen when the frequencies must be equal. If any values x, y, z 
proportional to these are used, the value (37) of m is 


(40) 


6 j+J_ 
Niq +'27^ 


and is the minimum consistent with any fixed value 6 of xyz. 

Choice of the factor of proportionality will involve a compromise between 
the criteria of minimum sampling variance and minimum bias. If we ignore 
components of bias of orders higher than the fourth and recall (10) and (11) it 
will appear that the appropriate combined criterion is that 

(41) -I- (lo* 


shall be a minimum, 
with respect to 6 gives 


Putting for n its value m' from (40) and differentiating 


-I- ? + 6 ,_6/« _ « 


or 


6 = 6 ' 


V JV/^4g + 27^ ) • 
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The product of the three roots (39) is — q~^. Numbers proportional to them and 
having the product h' will be obtained by multiplying them by — (b'qf*, that 
is, by 



2.3318 



Multiplying (39) by 2.3318 gives numbers 


(42) 


.6128, -.8695, -2.0751, 


which must still be multiplied by db [(rV(Nj 8 ) 4 ]*^® to give the set minimizing 
ESiA . The ambiguous sign is to be fixed according to the rule at the end of 
the last section. Thus we arrive finally at the conclusion; 

If the numbers of observations are required to be the same for all the values of the 
variable used, these values should for greatest efficiency deviate from the estimated 
maximizing value by the products of the three numbers (42) by 


(43) 


riz 


\NfiV ’ 


choosing the ambiguous sign so as to satisfy (35). 

The product b' of the three values is to be substituted for b in (40) and (35), 
and the value of u thus obtained from (40) is also to be substituted in (35). 
These substitutions yield 

(a; + 2/ + z)/3j^4 032^6 — 4/3j/34) < 0 

as the criterion for choosing the sign in (43). 

The expectation of the square of the error in the estimate of the value Xo of { 
is, accordmg to (9) and (10), given approximately by the ratio of (41) to 4j3j , 
and it is this that will be a minimum when the foregoing rule is followed. The 
minimum of (41) is obtained by replacing 6 by 5' in (40) and (41), and sub- 
stituting (40) for /t in (41). This gives 

that is, 

(44) E{Sbif = 4.889 


7. Adjustable frequencies. If the total number N of observations to be made 
can be distributed freely among the values of the variable, the efficiency of the 
experiment can be increased by a proper selection of the individual frequencies 
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k, m, n along with the corresponding values x, y, z. We shidl choose these 
six unknowns, subject to the three conditions* 

(46) k + m + n — N, 

(46) xy + xz + yz ^ 0, 

(47) xyz = -b, 

to minimize n. The last condition fixes the quartic bias, the preceding one ex* 
presses the vanishing of the cubic bias. It is of course understood that k, m, n 
are all positive, and we shall, as before, suppose initially that b is positive. No 
two of X, y, z can be equal, and it follows that none of them, or of the sums 
of two of them, can be zero while satisfying the second condition. We shall 
lose no generality in assuming that 

(48) x > y > 0 > z. 

Furthermore, it is easy to see that x + y, x + z, and y + « are all positive. 
Therefore the quantities 


(■401 - = y + ^ , = 

(x-y)(x-zr (x-y)(y-zy 

are all positive. From (34) we have 


X + 2 


t = 


x + y 


(x - 2 )(y - 2 ) 


(50) 


^ _L ® J- ‘ 

M = f H r -. 

k m n 


The values of k, m, n making this a minimum while themselves subject to the 
limitation that their siun is N must if they were continuous positive variables 
be proportional to r, s and t. Of course the frequencies are integers, but we are 
supposing N large, so that the values found by differentiation will be close 
approximations, and we shall disregard this complication. Put therefore 

(51) r = kp, a — mp, t = np, 

where p is a multiplier which evidently is not zero. If we use these equations 
to eliminate r, a, t from p we obtain, with the help of (45), p = JVp*. But if we 
use them to eliminate k, m, n from (50) we have instead, 

p = (r + a + t)p. 

Now from (49), 

(52) r + t = s. 


^ The condition (47) is here used instead of (31), from which it differs by the introduction 
of the negative sign, because it simplifies the argument of this section slightly to have the 
quantities (49) positive. There is no essential difference, since we are seeking a pair of 
opposite distributions. 
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SO that n — 2«p. Therefore Np = 2s, and finally n ■« Therefore m is 

a minimum when the positive quantity s is a minimum. In the expression 
(49) for 8 we substitute from (46) and (47) 

* + * = -xz/y = 6/y*, 

(53) 

(* - y)iy - z) = {x z)y - XZ - y - 2b/y - y , 

so that 


' ' y(2b -y»y 

Since y, s and b are positive, this shows that y* < 26. The value of y on the 
interval from 0 to 26 making s a minimum is found by differentiation to be 


Substituting this in (53) and (47) gives 

X + « = 2*'‘6*« = - 


_ nlliiVl 


whence 


(55) * - (6/2)^'’(l + \/3), y = (6/2)»‘, * = (6/2)‘'‘(l - VS). 

From (45), (51) and (52) it is seen that k + n = m — N/2. Thus half the total 
observations are to be concentrated on the middle value. From (51) and (49) 
we have also 

I 2 2 

fc _r _ y - z 
n t 3? — y^' 


wherefore 


With (55) this shows that 


Nx^-y^ 

” 2 X* - «*■ 


N(2- \/3)/8 m = N/2, n = W(2 + \/3)8 


.03349 N, 


= .46651 N. 


We have seen that y, = 4«VW. Substituting in (54) the value found for y 
gives 8 = Therefore the minimum of y for a fixed value of 6 is 

(57) y = (16/9W)(2/6)*'*. 

Inserting this in the expression (41) for the total expectation of the squared 
error and then differentiating with respect to 6 gives 


(58) 


6 - 2’'‘3-*'*ir* 
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When this value is given to b, (41) becomes 

(69) 3 . 8207 J\r*'*/ 3 iV'*. 

The greater efficiency of experiments with the frequencies (66) and the corre- 
spondingly adjusted values x, y, z, in comparison with the case in which the 
frequencies must be equal, corresponds to the smaller coefficient in (69) than in 
(44). To obtain as great accuracy with equal frequencies as with adjusted 
ones it is necessary to have more observations, in a ratio obtained by equating 
(69) with (44) after inserting different s3Tnbols for N in the two cases. In this 
way it is found that the number of observations required with efficient distribu- 
tion of the frequencies is almost exactly 72 per cent of the number required 
when the frequencies are equal, if the values x, y, z are in each case given their 
most efficient values. 

Substituting (68) in (66) gives the numbers 
(60) 2.1620, .7877, -.2110, 

multiplied by (43), with a change of signs if necessary to satisfy (36), as the 
values X, y, z of the variable to be used. The more concentrated character of 
this distribution with adjustable frequencies is emphasized by the small propor- 
tion, less than 31 per cent, of the frequencies (66) that pertains to the value most 
remote from the tentative maximizing value. 

When (68) is substituted in (67) and, with the result, in (36), this inequality 
reduces to exactly the same form as that obtained in the preceding section for 
fixing the sign of (43). 

8. Introduction to the two-variable problem. Functions of two or more 
variables are of greater practical importance than functions of one variable. 
The recent work on factorial experiments [8] makes it clear that in the experi- 
mental determination of maxima of functions of several variables, considerable 
improvements are possible over the practice of trying the effect of variations 
in only one variable at a time while holding the others constant. It seems likely 
that the methods worked out in the previous sections for experimenting with 
one variable are capable of generalization. However certain difficulties enter 
which have not yet been surmounted. The object of the present section is to 
indicate something of the nature of the problem of extending the foregoing 
results to two variables, x and y. 

Let us suppose that a quadratic regression equation, 

Z = 6oo + bitx -|- boiy -h Hbnx^ -b 2bnxy + hwy*), 

will be fitted by least squares to observations of z = /(x, y) based on N combina- 
tions of X and y, each of which represents a point in a plane. Since there are 
six coefficients to be determined, there must be at least six distinct points 
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(^ 1 ; J/i); • • • y (^6 y V^)- The coefScients in the normal equations may be written 
ajk = Sx^y^, so that Qoo = N. The determinant 


Goo 

OlO 

Ooi 

O20 

Oil 

O02 

aio 

O20 

Oil 

Oso 

021 

O12 

doi 

Oil 

002 

021 

012 

O08 

020 

Oao 

O21 

O40 

Osi 

O22 

Oil 

O21 

O12 

031 

02s 

O18 

Oo 8 

O12 

O08 

02s 

O18 

Oo 4 


must not vanish. Let the function under investigation be 

fix, y) = + A;)!, 

and suppose that fto = 0 = , so that the origin is the point sought at which 

the first derivatives vanish. We shall assume that 

0 = 010002 — 011 > 0 , 020 < 0 , 

implying a definite maximum. The estimates Xo , yo of the maximizing (or 
minimizing) values obtained by differentiating Z are 

^0 = (biihoi — boi2bio)/b, yo == (bnbio — b2oboi)/b, 

where 

b = b2obo2 ~ bh . 

For large samples and values of x and y taken not too far from the origin, b will 
approximate to 0, and Xo and yo respectively to 

i0iiboi ^o 2 &io)/^, i0iibio 02oboi)/0> 


Some means is needed of combining into one the two desiderata of minimizing 
the errors Xo and yo . A combined measure of these deviations is 

020 x 1 + 20 iiXoyo + 0 O 2 yo • 

This expression is constant except for terms of higher order when Xo and yo , 
while remaining small, vary in such a way that f(x, y) maintains a constant 
value. Substituting in it the approximate values of Xo and yo gives times 

0osi>\o ” 20iibioboi + 0i(f>oi • 


The expectation of this measure of error may be separated into two parts by 
means of the formulae for the variances and covariance, 

= Ebio {Ebiof , ~ Ebioboi — iEbio)iEbQi), etc. 
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One of these parts is a generalized sampling variance, 

0o2<rbio 2/3ii<r6,oboi + 

and tends to zero with order as N increases provided the values (xu , Vk) are 
fixed. The other part, 

(61) 0Q2(EbioY — 2fiii(Ebio)(Eboi) + ^oiEboif , 

is a bias which does not tend to zero as N increases, but which may be kept 
arbitrarily small, at the expense of the sampling variance, by restricting the 
values {xk , yk) to be sufficiently small. This expression is a negative definite 
quadratic form in Ebio and Eboi , and therefore cannot be zero unless both these 
components of bias vanish separately. 

We may proceed as in paragraph 2 to express Ebio and Eboi in terms of the 
coefficients of /(x, y) of orders higher than the second, among which those of 
third order will be of leading importance. In this way it may be shown that, 
if we neglect terms in /(x, y) of orders higher than the third, Ebio and Eboi are 
given by the ratios to a constant multiple of a of determinants obtained from 
a by replacing respectively the second and the third columns by the column 

^80U80 + 3/321^21 + 3^12Ui2 + ^08^08 

^80U40 + 3/321U81 + 3^12022 + Poidu 

(izodZl + 3)321022 + 3 ) 3 i 20 i 8 4 " PozdoA 

fizodbO + 3)32i 041 + 3^12082 + )3o3028 
fizodil + 3)321082 + 3^12023 + ffozdli 
)330082 + 3)32i 02S + 3)3i20i4 + PozfUib • 

It is desirable to select a distribution of points (x* , yk) such that these compo- 
nents of bias \vil\ vanish, no matter what may be the values of fizo , P 21 , P 12 and 
/3 o 3 . For this it is necessary and sufficient that all the determinants vanish 
that are obtained from these two by replacing the column written above by the 
terms in it that multiply any one of the four )3,fc's. The single-variable analogy 
suggests using a distribution having the smallest possible number of points, 
which in this case is six. Let us now take W = 6. The eight determinants will 
all be multiples of 


1 

Xi 

Vi 

xl 

xiyi 

y\ 

p = 1 

X 2 

Vi 

xl 

X 22/2 

y\ 

1 

Xo 

J/» 

xl 

xoyo 

y\ 


To save space we shall indicate determinants of this character merely by writing 
a single row without subscripts, thus: 

P - \1 X y xy y 
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If we'define 

Aik = 1 1 xV y ®* j/* I , 

A'lk = I 1 * asV a:* ly j/* I , 

and multiply each of these determinants for which j + A; = 3 (j, ^ 0, 1, 2, 3) 

by P, columns by columns, we shall have exactly the determinants whose van- 
ishing is the condition for nullification of the cubic bias. If we multiply P by 
itself in the same way we have P* = a. Therefore P 0 . Therefore the re- 
quired condition is that the distribution satisfy the eight equations 


An 

-0, 

An 

H 

p 

A[t 

- 0, 

Ao. 

A 80 

= 0, 

An 

= 0, 

A", 

= 0, 

Aot 


and the inequality P 5 ^ 0 . 

In seeking distributions nullifjdng the cubic bias we have twelve unknowns 
• • • , iCc , yi , • • • , 2/6 which must satisfy these eight equations. This sug- 
gests that we give arbitrary values to four of them and then solve for the other 
eight by straightforward elimination. Unfortunately, since the eight equations 
are each of the tenth degree, reducing to the ninth degree when coordinates 
of two of the points are given numerical values, a straightforward elimination 
would seem to lead to an equation of degree 9® ~ 43,046,711. The number 
of algebriac operations in performing the elimination, solving the equation for 
one of the unknowns, substituting back, and solving for the others, would be a 
large multiple of this number, and would doubtless be suflScient to occupy a 
large and efficient computing project for many millenniums. At the end of this 
period it might be found that the roots corresponding to the original arbitrary 
values chosen were all complex or made P = 0 , and were therefore unusable. 
Thus indirect and less elementary methods are called for, and some qualitative 
investigations of such distributions, if they exist (which is not certain), are in 
order. 

The set of conditions as a whole is invariant under all non-singular homogene- 
ous linear transformations of x and y, as is easily proved by making linear 
combinations of the columns of each of the determinants , A% and P, and 
by making linear combinations of these determinants themselves. These 
linear transformations leave the origin invariant. They have four degrees of 
freedom, which is exactly the right number to take care of the excess of un- 
knowns over equations. This points to the possible existence of a finite number 
of fundamental solutions, from which all solutions may be obtained by linear 
homogeneous transformations. Geometrical properties of the configuration will 
be represented by invariants under linear transformations. Thus the condition 
P 7 ^ 0 means that the six points must not all lie on any conic section. From 
this it follows at once that no four of them can lie on a straight line, since this 
line, with the Une through the other two, would constitute a degenerate conic. 
As a matter of fact, we can go further and prove that no three of the points 
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may lie on a straight line. In the proof of this and other properties of the dis- 
tribution it is convenient to use the arbitrariness provided by a linear trans- 
formation to pass the axes (which may be oblique) through any two of the six 
points, and then to adjust the scales of measurement so that the coordinates of 
these points become (1,0) and (0, 1), except that one of them might conceivably 
be the origin. If three points are collinear, their line can be taken to be the 
ar-axis if it passes through the origin, or the line y = 1 if it does not. Even with 
the help provided by such procedures the proofs are rather long, though straight- 
forward. We shall content ourselves here with stating, without proof, the fol- 
lowing properties necessary for sets of six points for which P Q and all com- 
ponents of the cubic bias vanish: 

No three of the points can lie on a straight line. 

No two straight lines through the origin can contain four of the points. 

No four of the points can lie on the vertices of a parallelogram. 

The set cannot consist of the origin and the vertices of a regular pentagon with 
center at the origin. 

These conditions have been established by calculations of a rather straight- 
forward and laborious sort, too long to be reproduced. 

If Zk — Xk + iyk and Ik — Xk — iyk , the conditions P 0, A^k = 0 == Ajk , 
may be written 

I 1 z 5 2 * 2* 1 0, 1 1 z¥ I 2* 2* I = 0, | 1 z 2 ^ 2 * z* 22 2* 1 = 0. 

9. Some further unsolved problems* Since it is useful to demarcate the 
frontiers of knowledge by pointing out what lies a little outside them as well 
as what is within, a few of the many questions may be mentioned which this 
paper falls short of answering. Besides the extension to two variables men- 
tioned in the last section, and to an arbitrary number of variables, it is desirable 
that the whole theory should be developed from an exact, or small-sample, 
point of view rather than on the basis of the large-sample approximations used 
here. This however appears to be an extremely large enterprise. A simpler, 
but still quite difficult, problem is to modify the criteria obtained in paragraphs 
6 and 7 so as to fit problems of economic experimentation, such as those of 
determination of maximum monopoly profit or minimum cost, in which the cost 
of each observation consists largely of the lost profit, or excess cost over the 
minimum, occasioned by the deviation from the value sought. In such a case 
the limitation of cost replaces the limitation of the total number of observations. 

Another important problem is to take account of the inaccuracy of the pre- 
liminary information on which the design of the experiment is based, and to 
utilize the relations thus involved to design efficient sequences of experiments. 

Determination of limits of error in terms of the maxima over an interval of the 
derivatives of f{x) should be a fairly straightforward problem in analysis and 
have practical importance. With this are associated various problems dealing 
with maxima of functions having discontinuities in the first or higher derivatives 
at or near the maximum. 
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An important extension would deal with the case in which the maximum is 
estimated from a least-squares polynomial of degree three or more. This might 
be a connected with the difficult wider problem of deciding on the degree of a 
polynomial to be fitted in a particular case. 

10. Summary. In determining the value ^of x for which /(a;) is a maximum 
or minimum, a quadratic polynomial may be fitted to observations made for 
chosen values of x. The errors considered are of two kinds: sampling errors 
resulting from the inaccuracy in each observation, which diminish as the number 
of observations is increased, but increase if the values of x are chosen too close 
to the value sought; and biased errors resulting from the fact that f{x) is not 
truly quadratic, which do not decrease when the number of observations in- 
creases with a fixed set of values of x, but do decrease when the deviations of x 
from the value sought are reduced. The biased errors may be separated into 
components corresponding to the third, fourth and higher powers of a; — f in 
the expansion of /(x), and these components will ordinarily be of diminishing 
importance as we go on in the sequence. However it is possible to choose values 
of X making the cubic component zero and the quartic component at the same 
time a minimum. Such a set consists of only three values of x. These values 
may be further adjusted to minimize the expectation of the square of the total 
error in f, as far at least as the term of fourth order in the bias, by a proper 
balance between the sampling variance and the quartic bias. The values of x 
satisfying these conditions, measured from the true maximizing or minimizing 
value f, are the products of by the values u in the table below. 

Since the root will usually be extracted by logarithms, the common logarithms 
of the values are given. The first set are the most efficient when the frequencies 
must be equal. The second set is appropriate when the frequencies are made 
proportional to the quantities in the last column; in this case only about 72 per 
cent as many observations are required for any specified accuracy as when the 
frequencies must be equal. The approximate expected squared errors in the 
estimates of { in the two cases are given respectively by formulae (44) and (59). 
All these results are approximations of the kind appropriate to large numbers of 
observations. 


Equal frequencies 

Adjustable frequencies 

u 

logio u 

u 

lOgloW 

Frequency 

-.6128 

-.21267 

-.2110 

-.67572 

.46 651 AT 

.8695 

- .06071 

.7877 

- . 10364 

.50 000 AT 

2.0751 

.31704 

2.1520 

.33284 

.03 349 AT 


The signs of u should be reversed if > 0. Here fik is the 

coefficient of (x — {)* in the expansion of /(x), and is the error variance of 
an individual observation. For designing an efficient experiment it is necessary 
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to have some knowledge of these quantities. It may be gained from preliminary 
experiments of smaller scale. 

A suitable preliminary experiment, where knowledge of the function is ex- 
tremely scanty, might consist of a fixed small number^ greater than one, of ob- 
servations on /(*) corresponding to each of a set of six or more values of x in 
arithmetic progression covering an interval that includes the value f sought, 
and selected with a view to getting ( in the center of it as nearly as possible. 
A polynomial of the fifth degree at least should be fitted by least squares, in 
which process all the quantities desired for the design of the later, larger experi- 
ment can be estimated, together with their accuracies. Since the values of x 
are taken in arithmetic progression, the fitting can be carried out with extreme 
ease by the method of orthogonal polynomials. 

Numerous subsidiary questions promise to have both practical importance 
and mathematical interest. 
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1. Introduction. The words ‘‘routine analyses^^ are used to denote the analy- 
ses performed by laboratories, frequently attached to industrial plants, and dis- 
tinguished by the following charaeteiistics: (1) All the analyses or measure- 
ments are of the same kind, for example, are designed to measure the sugar 
content in beets or to determine the coordinate of a star. (2) The analyses are 
carried out day after day using the same methods and the same instruments. 
(3) While all the analyses are of the same kind, the quantity measured varies 
from time to time and each such quantity is measured repeatedl}^ n times, 
where n represents some small number, 2, 3, 4, 5. 

As an illustration we may consider the routine analyses of sugar beets per- 
formed in the process of selection and breeding. A small section is cut out of 
each of a great number of sugar beets expected to be suitable for further breed- 
ing. It is crushed and its juice extracted to determine the sugar content of 
each particular beet. From the juice available from each beet n samples are 
taken and a determination of the sugar content is made from each. Thus, if 
f, represents the sugar content of the section from the ith beet and there are 
N beets, the laboratory will have to make nN analyses with their results ;ri,i , 
Xi, 2 , • • • , Xi,n , representing the measurements of the same quantity {,• . Ob- 
viously the sugar content referring to the ith beet need have no relation to 
that of any other jth beet. 

An essential point in the above description is that the number of measurements 
referring to the same quantity f, is usually very small. For example, the 
quantitative analyses of urine in certain clinics are performed only twice for 
each patient, so that n = 2. Frequently, various practical considerations make 
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it impossible to increase this number n of analyses intended to measure the same 
quantity . 

The smallness of n introduces difficulties in estiminating . It is usual to 
consider x<.i , Xi,i , • • • , *<,« as independent variables, varjring normally about 

with an unknown standard error Oi . If they have to be used to estimate {, , 
then the confidence interval [1]* for will be determined by the familiar formula 

(1) Xi. — s,<,(n) < < X{. + srf,(n), 

where x,-. denotes the mean of the xu , 

(2) s* = (*« - Xi.y/n{n - 1 ) 

/-I 

and <a(n) is Fisher’s i corresponding to the number of degrees of freedom n — 1 
and to the chosen confidence coefficient a. It is known [2] that if the estimate 
of is based only on its direct measurements x<,i , x,-,s , • • • , x<,» , then the con- 
fidence interval (1) can not be made any smaller; in fact, formula (1) gives the 
shortest unbiased confidence interval for . But if we try to substitute appro- 
priate numbers in (1) we get disconcerting results. Namely, if n = 2 and 
a = .99, then <a(n) = 63.657. If n is increased, the value of t«(n) decreases 
rapidly but forn = 5 it is still very considerable, t,(5) = 4.604, and consequently 
the numerical confidence interval determined by (1) is frequently so broad that 
it is devoid of practical value. 

The general conclusion is that, if n cannot be increased, satisfactory estimates 
of can only be obtained when they are based on something else in addition to 
the direct measurements x,.i , x,.s , • • • , x,-,„ . This point was first noticed by 
“Student” [3]. His method of avoiding the difficulty consists in assuming that 
the accuracy of measurements performed in the same laboratory is constant 
in time, so that (t-i = »,= •••= o-jv = o-. If this is true, then «* = 'Zs^/N will 
be an unbiased estimate of the variance of Xi, , based on N{n — 1) degrees of 
freedom. If the past experience of the laboratory is of any size, as measured 
by N, then the product N{n — 1) will be of considerable size and the confidence 
interval for f, 

(3) X.. - So<,(N(n - 1) + 1) < {.• < Xi. + 8o<„(iV(n - 1) + 1) 

will be much more satisfactory than (1). 

The problem which arises is whether we are entitled to assume that ct = 
(T, = . . . = ffjvr . The first study of this problem seems to have been made by 
Przyborowski [4] in a paper written in Polish. His findings, subsequently re- 
ported [5] in English, show that, at least in certain cases, the accuracy of routine 
analyses is quite difficult to keep constant. If it is not constant, then the rela- 
tive frequency of the cases where formula (3) gives correct statements about {,■ 
will generally be different from the expected a. 


'■ Figures in square brackets refer to the literature quoted at the end of the paper. 
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The procedure employed by Przyborowski to test whether <ri = <r 2 • • • = o-y 

consisted in considering the quantities Vi == (n — 1)«* and appl 5 dng the test 
to see whether they follow the same x^ distribution with n — 1 degrees of freedom 

(4) p(v) = 

with an unknown <r. 

Just this point is to be the main subject of this paper. The x^ test was de- 
vised by Karl Pearson with ik^ particular set of alternative hypotheses in view. 
As a result we may expect that in many cases other tests may be devised which 
would be more powerful. A number of such cases are already on record [6], 

m, [8]. 


2. Statistical hypothesis H to be tested. We shall consider the case where 
w^e can observe the particular values of N?i random variables x*., , i = 1, 2, 
• • • y N;j = 1,2, • • . , n, and w^e know that Xij Is independent of Xki for i k 
and that 


(5) 




wdth unknown values of and o*, > 0. The hypothesis H to be tested is that 
<ri = <r 2 = • • • = ctat = (T without specifying, however, the actual value of (r. 
It will be noticed that this hypothesis has already been treated by a number 
of authors [9]-[17]. The need for considering it again arises from the fact that 
previously it was tested against the set of alternatives presuming that the ai , 
tr 2 , • • • (Tat , were positive constants having any values whatsoever. It seems 
to the author that, in the present case, the set of alternatives should be different. 
This will be explained in the next section. It follows that wiiile the hypotliesis 
tested is the same as in the papers quoted above, the problem of testing it is 
quite different. 

Let us denote by E the whole set of Nn observable variables. If // is true 
then their elementary probability law^ will be 


( 6 ) 


p{E\H) = 



N n 

2; 2 


e /-I 




3. General problem of similar regions. The development of the test will 
follow^ the general lines explained elsewhere [18], [19], [20]. Denoting by W the 
Nn dimensional space of the wre want to determine a region w in W having 
the following properties: (a) if the hypothesis tested is true then the probability 
of E falling in w shall have some fixed value chosen in advance, e.g., e = .05 or 
6 = .01. This probability is known as the probability of an error of the first 
kind, (b) If H is not true then the probability of E falling in w as determined 
by one of the alternative hypotheses (that we assume likely to be true when H 
is false) shall be as large as possible in a sense that requires further explanation. 
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The probability with which this condition is concerned is a complement of the 
probability of an error of the second kind. Once the region w is chosen it will 
be used to test H in this way: if E falls within then H will be rejected. 

In the present section we shall deal only with ways of satisfying condition 
(a). The problem is similar to the one recently described by Hotelling [21]. 
The difficulty is that, if H Ls true, the probability law of E is given by (6) and 
contains N + 1 unspecified parameters, ‘^nuisance” parameters as Hotelling 
very appropriately calls them. If we take just any region w then it is most 
likely that the probability of E falling in it will vary with different values of 
> * • * ) • As a matter of fact, if we want the test to be absolutely most 

powerful, or at least relatively so, we must determine not just one single region 
satisf>dng (a) but actually all such regions or some broad family of them. PVom 
these wo shall then select one which seems most satisfactory from the point of 
view of (b). 

Systematic methods of determining regions of the above kind have already 
been considered [18], [20], [2]. In these publications they are called “similar'^ 
to the sample space W, The reason for this term is that the whole space W does 
possess the required properties with € = 1 . In fact, whatever be the values of 
the nuisance parameters, tr, , • • • , {at , the probability of E falling within 
as calculated from (0), is perfectly determined and equals 1. Our problem is 
to find a region Z4?, part of W, with similar properties for 0 < € < 1. However, 
in many cases no such regions exist [22]. 

The general methods in the above publications arc applicable in the present 
case. However, a re(;ent paper by Cramer and Wold [23] allows a slight im- 
provement in presenting the matter. As this is a little involved, it seems de- 
sirable to take up the whole problem and present it anew. 

Consider then the general case where the probability law of some m observable 
variables y\ , , • * • , ?/m , say p{E 1 , • • • , as specified by the h>T)othesis 

tested, depends on .s nuisance parameters di , 02 , • • • , . Our problem will 

(‘onsist of determining the necessary and sufficient conditions for a region w to 
be similar to the sample space with respect to all these parameters. We shall 
assume that the probability law p{E | 0i , • • • , 0») satisfies certain limiting con- 
ditions. 

Let 


( 8 ) 


d log p 
^"dSi 


_ log p 


Assume that for all values of i and j == 1, 2, • • • , s 


^ 9 ) 


(PH = A,',/ -h BiH,k(pk 

k^\ 


where the coefficients A,,, and are independent of the observable variables 
E. Assume also that the probability law p{E\B\ , • • • ,0,) permits indefinite 
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differentiation under the sign of the integral taken over any fixed region wmW. 
It is easy to check that the probability law (6) satisfies all of these conditions. 

In order to find the necessary conditions for the region t(? to be similar to W 
with respect to , ^ 2 , • • • , , assume that w is actually similar and that, conse- 

quently, 

(10) p{E ew\6if = J * f 1 > * * * ^ 

for all possible values of , ^2 , • • • , . It follows that the derivatives of all 

orders with respect to , ^ 2 , • • • , taken from the left side of (10) must be 
identically equal to zero. But we have 


~ f • • • j p(.E \ei, ••• ,e.)dyi-- - dy, 


(11) 



• • , fl.) dyx • 

’■ dy„ 



= 1 j >PiP{E\Bx, •• 

• , O dyi • • 

• dy„ s 0 

for * = 1, 2, • 

, 5 . 

Similarly, using (9) 



a* r 

(12) 

•••/ 

^ / 

p(JE 1 , • • • ^f) dyi • • • dym 

% 

m V 




• J ypt<pj + Aij •+• 2 p{E \$i, • • • 

■ , e.) dyi . . 

ill 

p 


Using (10) and (11), the last identity will be reduced to 


(13) - / • • • [ <pmp(,E 1 «!, e,)dyi-- - dy„ = -A,-,,- for i,j = 1, 2, • • • , s 

where the right side does not depend on the particular region Wj provided that 
w is similar to the sample space. Considering the identities (11) and (13) 
which were obtained by differentiating (10) twice, we may guess what will 
happen if we differentiate (13) again and again. We may assume, in fact, that, 
whatever be the non-negative integers ki , • • • , A;. , we shall obtain 

(14) - f • • • f n ipi'piE\ex, ••• ,e,)dyi-- - dy„ = M{kx, h, • • • , K), 

e J Juf t— 1 

where M(ki y • • • , &,) is independent of the particular region Wj provided that w 
is similar to the sample space with respect to all of the ^^s. Assume that this is 

found for all k*8 such that ^ ki < K; also assume that the sum of the k^s in 

»-i 

(14) is exactly K, Differentiating with respect to $j , we obtain 

If... f H + H 2 *pT^*Pt,j[p{E 1 ^ 1 , • • . , ds) dyi •• • dyrn 

e J Jw <-i fi-i c«ii J 


(15) 


-^MOcx, ...,fc.). 
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Because of the particular form of , the second expression in the curly brackets 
under the integral is a polynomial in the ^’s of order not exceeding K. According 
to the assumption made, this expression multiplied by p(E | , • • • , Sf)/^ and 

integrated over w gives a result which is independent of w. As the right side of 

(15) is also independent of w, we conclude that 

’■/***/ n I > • • • I d2/i • • • dynt 

(16) ^ 

HE M (fci k«) 

is also independent of the particular similar region chosen. We have seen that 
(14) is true for jK < 2 and that if it is true for K it is true for i? + 1, that is, 
it is true in general. 

We may now sum up our findings: if is a region similar to the sample space 
with respect to all of the and if e denotes the value of the integral (10), then, 
whatever be the non-negative integers ki , k 2 , • • • , A. , the value of the integral 
on the left side of (14) is independent of the particular region w chosen. 

As the whole sample space W is also ^‘similar” with c = 1, it must satisfy this 
identity. This allows us to determine the Af^s, namely 

(17) f ... f II p(E \ ei, . . . , e.) dj/i . . . dy^ s£ M(ki, . . . , k,). 

J Jw t-1 

It is obvious that the necessarj’^ condition above is also sufficient. If a region 
w is such that (14) holds for all systems of non-negative integers then all the 
derivatives of (10) must be identically zero; thus the left side of (10) is inde- 
pendent of , ^ 2 , • • • , . 

It will be useful to interpret the above conditions as follows. We start by 
noticing that the left side of (17) represents the product moment of some speci- 
fied order of the ^ , « • • , considered as random variables. We shall call 
it the absolute product moment. We will now interpret the left side of (14) 
as a product moment also. For this purpose we shall define a new elementary 
probability law of the y^s to be denoted by p{E 1 ic, , • • • , ^,) and described 
as the relative probability law given w. We shall write it as 

(18) piE I ic, , • • • , ^*) == “ piE 1 01 , •••,$,) 

for all of the points E included in w and 

(19) p(E 1 1C, 01 , • • • , 0.) » 0 

for all other points. With this definition the left side of (14) appears to be the 
expectation of the product calculated from the relative probability 

law of the y’s given ic. We will call it the relative product moment given w. 
The final result can now be stated as follows: 

For a region w to be similar to the sample space with respect to 0i , 02 , • • • ,0, 
it is necessary and suflScient that all the relative moments and product moments 
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of ^ , • • • , shall equal the corresponding absolute moments and product 
moments. 

In order to make the method of constructing similar regions according to the 
above conditions clear we recall the procedure involved in the calculation of the 
probability laws of any given set of random variables. 

Assume then that the elementary probability law of the original variables is 
given. Fix some values of the parameters , ^ 2 , • » • , , denote the resulting 

probability law by v{E), and consider the problem of finding the elementary 
probability law of , • * • considered as functions of the jy^s. We shall 
assume that none of the ^^s can be expressed as a function of the others not 
involving the explicitly so that the matrix 

dip\ dipi dip\ 

^y\ d2/2 dy^ 

(20) 

I 

dtfg d<pa d(pg 

, di/i dyz dym 

is non-singular. In these circumstances it is possible to select m — functions 
of the y^s say ^,+ 1 , ^,+ 2 , • • * , which have continuous second derivatives such 
that the formulae 

Zi = ipi f = 1, 2, . . * , s 

(21) 

Zi = 4'i j = s + 1, , m 

determine a one-to-one transformation of the space W of the y^s into the space 
W' of the If w denotes any region in W then it will be transformed into a 
perfectly determined region w' in W\ If E' denotes a point in W' then the 
probability of E^ falling in w' will be identical with that of E falling in w. Thus 

(22) P[E'tw'\^P\Etw\ = j f^p(E)dyi-..dy,n. 

Letting J be the Jacobian of the with respect to the in the transformation 
(21) and using the known formulae for transforming multiple integrals, we have 

(23) P{E'ew'] = / ... f^p(E)^\J\dz^... dz„, 

where p{E)]t> denotes the result of substituting the expressions for the y’s in 
terms of the z’s as obtained from (21) into p{E). It follows that, whatever be 
the region w' in W, the probability of E"ti falling in it is obtained by integrating 
the function p(E)]g' | J | over w'. But this means, according to the usual 
definition, that the product p(E)]k' | •/ | is the elementary probability law of 
the z’s. Denoting it by p(E') = p(zi , • • • , z„) we have 

(24) p(E') = piE)],. I J |. 
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Now, to obtain the joint probability law of ^ ^ , • • • , or that of zi , 
we must integrate p(S') for all the other z’b between their extreme 
limits, formally between — <» and + « for each of the variables concerned. 


(25) 




,<(>•)= r rp(E')dz^i 

J—OO 


dz„. 


This procedure will be applied when calculating the absolute probability law 
of the ^’s and also the relative one given w. The only difference will be that in 
the latter case we shall have to start with (18) and (19) instead of the original 
probability law. The sparse W and the transformation (21) wiD be the same 
in both cases. It is important to be clear about the difference between the two 
cases. This is connected with the difference between p{E \ 0i, ■• • ,$,) and 
p{E I u), , • • • , fl,) of (18) and (19). The latter is proportional to the former 

at any point E within the region w but is zero outside of w. Ab mentioned 
above, the integrations for z^i , 2 ,+j , • • • , Zm in (25) should extend formally 
from — 00 to + 00 for each variable. However, the probability law p(E') may 
equal zero within certain parts of this range. Fixing any system of values 
Zi = , fori = 1,2, • • • , s, is equivalent to fixing a h)q)ereurface in the space W 

and considering the intersection of planes z,- = constant in the space W'. De- 
note them by W(<p) and respectively. If we shift the point E or E' 

along Witp) or W'{<p) respectively, the variables z,- = , for j — « -H 1, 

s 2, • • • , m will assume a certain set S{(p) of systems of values. When calcu- 
lating the absolute probability law of ipi , ^ this set S{ip) will be the real 
region of integration in (25); outside of it the function under the integral sign 
will be zero. On the other hand, when calculating the relative probability law 
of tpi ,•••, given lo, the function under the integral (25) is zero as soon as 
the point E moves outside of the region w. Denote by w((p) that part of W (ip) 
which is included in w and by w'{<p) the corresponding part of W'(<p). So, the 
absolute and the relative, given w, probability laws of ipi ,<pt can be ob- 
tained by using the formulae 

(26) pifpi > • ' ' > <P*) = / • • • / piE') dzt+i • • • dZm 

•» •'ir'(r) 

(27) p(<pi, •••,¥’. 1 1») = - / ••• / p(,E')dz^i dz^. 

e J Jv'ir) 


Now the method of constructing regions similar to W with respect to 0i , 
^ 2 , • • • , is clear: to construct any such region it is necessary and sufiScient 
to select for each of all possible systems of values of yjj , ^ , • • • , a part w{ip) 
of the hypersurface W {(/>) and to combine all these parts. The selection of w(ifi) 
is arbitrary save for the restriction that the probability law (27) have all its 
moments equal to those of (26), identically in the 6’b. This last condition will 



54 


J. NEYMAN 


certainly be satisfied if w(<p) is so selected that for almost all systems of values 
of ^ , • • • , 

(28) p((pi ,••• ,<ps I w) ^ p{(pi ,••• f (pi) 

for all values of the 

By selecting w{(p) in all possible ways that satisfy (28) we obtain an infinity of 
regions similar to W with respect to , ^ 2 , • • • , . They form a family which 

we shall denote by F(€). However, it is known that in general all the moments 
of p{(pi , ... , I u’) and p((pi , ... , ^,) may be identical without the two proba- 
bility laws being equal almost everywhere. In such cases, the family F(e) will 
not exhaust all the similar regions. It is important to be able to state whether 
or not F(e) contains all the similar regions. To ascertain this we may use the 
conditions of Cram4r and Wold [23] which are sufficient for the determinateness 
of the problem of moments, that is, for the uniqueness of a function having a 
given set of moments. 

Let 

(29) Hi = M(v,0,0, ... ,0) ... ,0) + ... + M(0, 0, ... ,0, v). 

With this notation the conditions of Cramer and Wold can be stated as follows: 
If any two probability laws, c.g., the probability laws p(v?i , ... y(p«\w) and 
p{<Pi > • • • > v^»)> their moments and all their product moments identical 

and if the series 

(30) 

P 

is divergent, then 

(31) jV.lw) = p{^i, , <fi,) 

almost everywhere. 

Therefore, to know whether the family F(€) defined above exhausts all the 
regions similar to IT, we must calculate the even moments of all the (pi and see 
whether the series (30) depending on these moments is divergent. If it is, there 
is no similar region besides the family Fie). Otherwise, there may be some 
others. These others will be constructed by selecting w{(py& such that the in- 
tegral (27) equals any other probability law having the same moments as (26). 
In such cases, a region w selected, in one way or another, from the family Fie) 
as the best from the point of view of controlling errors of the second kind will 
only be the relative best. 

It should be mentioned that whether we can always, under the conditions 
considered, select a wi(p) on any Wi<p) that satisfies the identity (28) has not 
yet been proved. However, it seems plausible that the differential equations (9) 
imply the existence of a suflScient set of statistics for , ^ 2 , . * . , . If this is 

so, the possibility of satisfying (28) is guaranteed (see [2], p. 366). 
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4. Regions similar to tiie samiile ^ce witii respect to a, (i, £i, • • • , |jr> 
We may now return to the original problem and apply our theory to tiie probar 
bility law (6). We wish to construct the most general redone similar to the 
sample space with respect to the nuisance parameters <r, (i, - unspecified 

by the hypothesis tested. We let 


(32) 


<f>r 


Off ff ff* <-l iml 


(33) 

Then 


d log p 


n(Xi. - {<) 
ff* 


with xt. 



d(pt 3 2iNn 

dff ff ff* 


(34) 


d<p. 


= —2a<pi 



n 

ff* 





and we see that the probability law (6) satisfies the differential equations (9). 

Now the hypersurfaces W{<p) of the theory are the intersections of the hyper^ 
surfaces 


(35) = constant and (pi — constant, for t = 1,2, --- ,N. 

The latter equations are clearly equivalent to 

(36) Xi. = constant. 

As to the former, we notice the identity 

(37) 2 2 (*<,y ~ iiY = « 2 (5* + (*»■• -r {»)*) = x*( (say) 

»-i j'-i <— 1 


n 

where n5* = ^ (a:,-,,- — x<.)*. 
hypersurfaces (36) with, say. 


Therefore, W(<p) denotes the intersection of the 


(38) 


n 

Ti = 5* = constant. 




If we succeed in selecting from each hypersurface W(^) a part v)(<p) satisf 3 dng 
condition (28) identically then the sum of all such regions w(<p) will form a 
region w similar to W with respect to all the unspecified parameters and belong- 
ing to the family F(c). Before proceeding to this stage of the solution, let us see 
whether the family F(e) exhausts all of the similar regions. 
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For this purpose notice first that instead of considering whether there is but 
one probability law with moments equal to those of if>„ and the it is suiBS- 
cient to concern ourselves with the moments of x and Xi. . In fact, all the ^^s 
are functions of these variables and the problem of uniqueness of the distribution 
must have the same answer in both cases. The 2>'th absolute moment of x 
as calculated from (6) equals 

(39) (2(r')*T(iiVn + 2v)/r(iiV'n). 

The same order moment of Xi. is 

(40) a^\2v)\l(2nyv\. 


Thus, the quantity denoted by in the theory becomes 


(41) 


= 


/^V(2.)! 

r(iiNrn) Vn/ 2^v\ * 


We are interested in whether or not the series (30) is divergent. Since )U2i* satis- 
fies the inequality 

(42) M2. < a^T(6 + 2v) = (say) 


with a = 2(t‘ + 'N and 2b — Nn, if wo prove that the series SC'2. diverges, then 
(30) also diverges. To settle this conveniently we apply Stirling's formula to 
T{b + 2v) and find that, as v oo, the ratio C 2 y/v~^ tends to a finite limit. As 
the series is divergent, so is the series and thus the series is 

divergent. Therefore, there is but one probability'^ law with moments identical 
to those of and the a-i/s and so the family F(c) contains all the regions similar 
to the sample space with respe(;t to or, , • • • , Jat . 

It may now be interesting to go into some details of the effective construction 
of any region similar to W with respect to <t, , • • • , ^^r . For this purpose it 

is convenient to go back and express the identity (28), that the regions w{ip) 
must satisfy, in terms of the relative probability law of Zs+i , 2^+2 , • • • , 2ni given 
<Pi y <P 2 , • - ,<Ph ^ This is denoted by ^(2,4.1 , 2*42 , • • • , | , • • • , <^*) and de- 
fined for every system of values of the ip's for which 7?(v?i , <iP2 , • • • , <^«) 0 as 

follows: 


} ^*+2 , * • • > I , ^ 2 , * • • , ^s) 

(43) 

~ p(<Plt ' ' ' j > ’ * • > 2m)/p(^l, • • • , f^«)* 

Using (20), (27), and (43), the identity (28) can be rewritten in the following form 

(44) / • • • / ••• yZn,\iply • • • , V?.) dz,^i ••• dZm ^ 

The function under this integral is the relative elementary probability law 
of 2,41 , 2,42 , • • • , 2m and it is integrated over the region w'(ip). Therefore, the 
left side of (44) is nothing but the relative probability of the point F' falling in 
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w\ip) given that the first s of its coordinates have the fixed values , • • • , . 

In other words, and owing to the one-to-one correspondance between the spaces 
W and W\ we have 

( 45 ) P[E' t w\<p) I F € W\<p ) } ^ P[E€w{ip)\Et W{ip ) ) s e. 

Now the general method of determining similar regions may be stated as 
follows: 

1. Choose any system of variables , ^,+2 , • • • , such that their values 
determine uniquely the position of the point E' on any fixed hypersurface W\ip), 
These 2^s considered as functions of the y^s should be continuously differentiable 
twice. 

2. Find the relative probability law of the sj’s given the ^^s. This must be 
done for every possible set of values of the ^’s. 

3 . In the space of 2,4.1 , 2,4.2 , • • • , 2^ consider regions which satisfy the equality 
( 44 ) identically in the ^^s. Any such region could be taken to form a part of t(?', 
the region similar to the sample space, which we are trying to construct. If 
the assumption that the differential equations ( 9 ) imply the existence of a suffi- 
cient system of statistics for , ^2 , • • • , is true, then (see [ 2 ], p. 366 ) the 
probability law 33(2,4.1 , 2,4.2 , • • • , 2m | , • • • , ^.) will be independent of the 
0^8 and there will be an infinity of regions satisfying ( 44 ). 

Obviously, instead of dealing directly with ^1 , ^ , • • • , as described above, 
we may select any system of statistics Ti , 7’2 , • • • , IT, such that the system of 
equations 7 \ = constant Is equivalent to <pi = constant, for i = 1 , 2 , • • • , 5. 
Returning to the particular problem of similar regions with respect to a, 
> • • • ) {at , we notice that instead of the ^’s we may consider 

( 46 ) Ti^'Zs^i and Tt+i = Xi. for i = 1 , 2 , • • • , JV. 

Now we wish to select a convenient system of variables, denoted by 2,4.y’s in 
the theory above, to determine the position of the point JS' on any hypersurfacc 
W'(fp) where all the functions ( 46 ) have fixed values. Obviously there is no 
unique choice and we shall use what we find convenient. But notice that the 
total number of these variables should be, in our case, Nn — JV — 1. The 
following system may be suggested. 

If the sum has a fixed value Ti then none of the S\ can exceed Ti . Write 

5 ? = UiT, 

(47) . / \ 

= ( 1 - Z vATx for t = 1, 2, . . . , JV - 1 

and consider Ui, lit, ••• , Uat-i as belongmg to the system of variables sought. 
The region of their variation is determined by the inequalities 

N-l 

0 <114 and Z ^ 1 

i-ml 


( 48 ) 
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If the u’s are fixed then they, together with the value of Ti , determine the 
values of Si , S% ,•••, Sn . As the values of Xi. = Ti+i are already fixed, we 
have to solve the problem of choosing for each i = 1, 2, • • • , AT a system of 
n — 2 variables, say 2,-,i , 2,',s , • • • , Zi,n-t , which with x<. and Si will completely 
determine the values of x,-,i , x,-,2 , • • • , x<,n . However, this will only have to be 
done if n > 2. Following the now familiar method (see, for example, [5], pp. 
33-43), we may determine the 2,-,,- ui two consecutive steps. First write 



Xi,i = Xi, 

+ 4/1 ^ 2.3''‘'’ + • • • + ^/(n - Dn*'*'"-* 

(49) 

Xi,i = Xi, 

4/ 1.2*’"' + 4/1^3 *’"*+••• + 4/(n - Dn*’""-' 

Xi,z = Xi. 

24/73*’"*+ ••• + 4/ (n-l)n *'""-' 


II 

e 

j? 


where 

Vi.l , V».2 , 

• • • , are new variables satisfying the identity 

(50) 


£ = £ (*.-./ - *<■)*• 


We transform them further by putting 


Vi.l = ■y/n Si cos Zi,n~i cos 2<,„_j . • ■ cos Zi,2 cos 2i.i 
Vi,t = y/nSi cos 2,-.„_2 cos 2<,„_8 • • • cos 2,-,2 sin 2,-,i 
(51) Viit = y/nSi cos 2<.„_2 cos 2,-.„_» • • • sin 2j,2 


W.n-l = \/w'S<sin2j,„_2 

with the 2’s varying as follows 


(52) 


0 < 2<,i < 2 t 
—w/2 < Zi,i < t/2 


for j = 2, 3, 


n — 2. 


Of course, instead of the Si we should put their expressions in terms of Ti and 
the «’s into (51). With the exception of a set of measure zero, which can be 
ignored, the formulae above determine a one-to-one transformation of the 
original space W of the x’s into the space W' of Ti , r2 , • • • , Tfi+i , ui , • • • , 
Uh-i , and Zi,i , Zi,i , • • • , 2,-,„_2 for i = 1 , 2 , ■■■ ,N. 

In calculating the joint probability law of all the new variables, we notice 
that, on the hjrpothesis tested, all the Nn original variables are mutually inde- 
pendent. Consequently, the transformations (49) and (51), which refer to 
separate groups of the Xi./a, corresponding to fixed values of i, could be carried 
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through separately. In doing so, we use formulae deduced elsewhere (see [5], 
pp. 38-39) directly and obtain 


(63) p(*<. , 8{, Zi,i, • • • , Zi.n-i) = 


y-t 


It follows that 


p(*i. , • • • , Xk. , Si, • • • , Sir , Zi.i I ♦ • • > r^.n-*) 

N 

= n p(^«* y Sif 2».l, • • • , 

( /—' V j^ll tt If ff n , 2 

n n n cos-^ z*.,. 

a V2 t/ <-i *-i 1-1 

We now wish to introduce Ti and the w< instead of the S/b. Since all other 
variables remain unchanged the Jacobian of this transformation reduces to that 
of (47). Simple calculations show that 

isfr '’f” ' ' ' 'u"\ I “ (> - 2 mV n Iff*. 

1 0( i 1 , Ml , • • • , Uat-i) I \ j-1 / i-i 

Using this expression and substituting (47) in (54) we finally obtain 

p(xj. , • • • , Xir. , Ti, Ui, • • • , Ujr— i| * 1 . 1 , • • • , *Ar,«-a) 

(se) t'VS/ ■ 

// \ \i(n-8) AT n-2 

•e 1 - z w<) n w<) n n cos^* **./• 

\\ »-i / f-i / fc-i y-2 

To obtain the relative probability law of Wi , W 2 , • • • , , 21 . 1 , • • * , ^^Ar.n-s 

given 7\ and the = Xi, , we must calculate p{Ti , T 2 , • • • , Tat+i) and 
divide expression (66) by it. Of course, p{Ti , T 2 , • • • , Tat+i) is obtained from 
(56) by integrating over the whole of W'{<ip)j that is, for all other variables be- 
tween the extreme limits of their variation. As these limits are independent of 
the values of Ti , 7^2 , • • • , T^r+i , the result will be 

(57) p(r,, Ti,.-- , Tn+d = 

where c denotes a constant. Thus 

p{Ui , • • • , Ujf^i , Zi,i , • • • , \Tlf • • • , Tjv'+l) 

// \ \i<n-8) AT n-2 

ci((i “ z Mi) n Mi) n n cos^* *w 

\\ t-i / t-i / A-i y-2 


( 68 ) 
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with the region of variation W\<p) limited by the following inequalities 

0 < Zk.i <2t for Jfc = 1, 2, . . . , at, 

— ir/2 < Zk.i < ir/2 j = 2, 3, • • • , n — 2. 

Since (58) integrated over W*{ip) is identically unity, Ci is a purely numerical 
constant. 

Now to construct any region w similar to the sample space with respect to cr, 
> • * * » » we must select, separately for each and all systems {sp) of values 

of , 7^2 , • • • , Tn+i , a region part of W'ip) as defined by (59), with the 
sole restriction that 


(60) 


/ ••• / p(mi, 


, Un-\ , 2i.1 , 


Zs,n-2 \Ti, 


• • , TjV+l) 

•dUl, • • • , dZ/(f,n-2 ~ 


Obvdously, there is an infinity of ways of selectmg any single one of such 
regions. For example, we could let the u's vary as indicated in (59) and limit 
the 2^8 by 


(Gl) 0 < Zk.i < a, — a < Zk,j < a (fc = 1, 2, • • • , iST; j = 2, 3, • • • , n — 2) 

where a is chosen so that (60) is satisfied. This choice of w'((p) may correspond 
to one particular system of values of Ti , r 2 , • • • , 7’Ar+i and no other. Again, 
the same region (61) may be chosen to serve for all systems of values of the 7’\s. 
In this case, the region ly = 53 might be descrilx^d as cylindrical. Any 

V 

such region w will control errors of the first kind in testing H to the same level 
of significance € and, as far as these errors alone are concerned, each of these 
regions is of equal value. Whatever the choice of regions w\(p) or w{^p)y the 
test of H will consist of (1) observing the values of the x,,,'s, (2) calculating the 
corresponding value of Ti , 7^2 , • • • , 7V-n , the w^s, and the z% and (3) noting 
whether the point with coordinates , 1^2 , • • • , , 21 , 1 , • • • , 2 Af.n -2 falls in 

the region w(<p) chosen to correspond to the observed values of 7’i , 7 2 , • • • , 
7V+1 . Of course, in practical cases, the choice of w'(fp) for one system of values 
of the T*s will not be quite unconnected with that for others. On the contrary, 
there will probably be some more or less simple rule connecting w'(<p) with the 
corresponding systems of the As a result, the actual machinery of the test 
will be much simpler than that described above and will consist of the calcula- 
tion of only a very few functions of the x^s and in checking some simple in- 
equalities. 

Now our purpose is to select a region from the infinite family F(c) of all 
regions similar to the sample space with respect to o-, , ... , which we judge 

most satisfactory for controlling errqrs of the second kind. Roughly speaking. 
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this region will have to be such that, if the hypothesis H is not true, the observed 
point E will fall in this particular region as frequently as possible, in general. 
Here we come to the necessity of specif 3 dng the ways in which we expect the 
hypothesis H to be untrue. It may be untrue in an infinite number of ways. 
For example, the values of the <r’s may (1) be equally distributed over any given 
range, (2) may fall into just two groups <r» = 1 and (t, = 2, or (3) all cr/s except 
the last may have the same value <r while the last is lOo-, and so forth. Any 
such assumption will be called an hypothesis alternative to H, It is obvious 
that the probability of E falling in any given region w will be different for each 
of them. Therefore, if we wish to deduce a test which will detect the falsehood 
of the hypothesis tested frequently, we must analyse the practical cases where 
the test is to be applied and guess the ways in which the hypothesis tested is 
usually wrong. Then we can deduce a test which will be, in one sense or 
another, most sensitive to the assumed deviations from the hypothesis tested. 
Needless to say, our guess may be right or wrong. In the latter case, an in- 
creased volume of observational material may demonstrate its fallacy and sug- 
gest the necessary modifications. In any case, it is important to know exactly 
the class of alternatives for which our test is, in some particular way, the best. 

5. The set of hypotheses alternative to H. Let us consider the routine analy- 
ses made at some laboratory and try to discover the circumstances likely to 
cause variation in their accuracy. First of all, we may think of assignable 
causes such as a change in personnel, apparatus, or accommodation. These 
and similar causes are likely to produce lasting effects; the test of the hypothesis 
that they did not reduces to one of the equality of only two <r^s. An easy 
application of known theory [20] shows that the familiar F or z test is unbiased 
of type Bi , which means that it is preferable to any other. Consequently, 
situations of this kind and also similar one for w^hich the Li test is applicable [9], 
need not be considered here, so that we may concentrate on cases where there is 
no directly assignable cause of variation in the accuracy of the analyses. As- 
sume then that the personnel, the apparatus, the accommodation, etc., remain 
the same. Now the acc^uracy of analyses depends on a multitude of causes 
evading identification, such as changes in the efficiency of the workers. In 
principle, they try to have the highest, and therefore a constant, level of accuracy. 
Uncontrollable circumstances cause some fluctuations about a certain average 
and we expect that small deviations from this average will occur more frequently 
than large ones. With this in mind, the author feels that it would be appro- 
priate to expect that variations in accuracy, if any, will have a random character 
so that any o-, referring to one particular group of analyses, or any monotonic 
function of that a,- could be considered as an essentially positive random variable, 
having some unimodal probability law. To make the problem of the best test 
sufficiently specific, we must specify this law entirely. Here we face a some- 
what embarassing freedom of choice. For lack of more precise information as 
to the random variability of o-* , we guide ourselves by considerations of ease in 
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calculations. From this point of view it is convenient to consider the variable 
(62) h = 


and assume that, within a given period of time which is not too long, when the 
conditions in a laboratory are sensibly constant, it is varying according to the law 

(63) p(.h) = /8“A“~‘e"^/r(a) for 0 < A, 

where a and are unknown non-negative constants. It is useful to express these 
constants in terms of two new ones which have an obvious interpretation : ho , the 
expectation of fe, and v, the square of the coefficient of variation of h. Easy 
calculations give 


(64) a ^ Ijvy = llhoP, 

Now p{h) has the form 


(65) 


p(A) = 


1 

(AoK)»/'r(iA) 




We note that when ► 0 the probability law (65) tends to a limiting dis- 
continuous form with P{h == ho] = 1. This corresponds to the hypothesis H 
that we wish to test. The type of law represented by (65) is known to be 
rather flexible. Consequently, we may easily assume that even though the true 
variability of h (or <r) does not exactly correspond to (65), there will be a system 
of values of ho and for which the difference between the tnie law and (65) will 
not be large. Therefore, a test which is particularly sensitive to deviations of v 
from zero with law (65) will be reasonably sensitive in real practical cases. 
However, this is an assumption by the author. But it is subject to test and this 
will be done below. 

Formula (63) represents the hypothetical probability law of the variable h 
which is not directly observable. We must use this formula to obtain the 
probability law of the observable ar^s alternative to (6), which corresponds to the 
hypothesis H being true. Using h = l/a*, we write the relative probability law 
of Xi,i , Xi, 2 , • • • , Xi,n given h 

(66) P(XM , . . • , I A) = (Ay'" 


Multiplying (66) by (65) we obtain the joint probability law of h and the Zi,/s 
referring to one group of analyses 


(67) p{hj Xi,if • • • , a?»,n) — 


1 


{2iryf^ihopyf'T{l/v) 




Integrating (67) with respect to A from zero to infinity, we obtain the absolute 
probability law of x.-,i , a:,-.* , • • • , , all referring to the tth group of analyses. 

Assuming that the value of A in one group of analyses is independent of that in 
another, we obtain the joint probability law of all the Nn observable x<,/s by 
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simply multiplying the probability laws referring to particular groups of n of 
them. The result will depend on + 2 unknown parameters, 
in , ho , and r. As the last two will play a more important role than the others 
we shall denote the probability law by p{E | A) , v). Easy calculations give 


(68) p(E\ho,y) 


/r(n/2 + 1A)Y 

V( 2 v)"«r(lA)/ /i 


(Aor)*''" 




n /2+1/p • 


We easily check that for i/ -+ 0 (68) approaches the law (6) with ho = a-*"*. 
Therefore, the problem that we shall treat below will be to assume that the 
observable follow (68) with some ho > 0 and some v > 0 and to test the 
hypothesis H that r = 0. More specifically, we shall try to choose among all 
the regions of the family F(€), found in the preceding section the one over which 
the integral of the function (68) is, in general, the largest. 

Before doing so, it may be useful to exhibit some experimental evidence in 
favor of the assumption that, if o- is not constant in some conditions of analysis 
or measurement, then it varies in such a way that the variability of the x^b has 
at least some characteristics appropriate to (68). 

Introduce the notation 


(69) 


y-i 


Using transformations (49), (50), and (69), successively, we easily deduce the 
probability law of w* 


(70) 


, . _ (/iov/2)‘<"-“r(i(n - 1) + lA) 

^ r(i(n - i))r(iA) (1 + 


If the hypothesis we have made about the variability of A, as expressed by (65), 
is true in any particular case then the sums of squares (69), referring to each 
particular group of analyses, are distributed according to (70). The reverse is 
not necessarily true, of course, but it is comforting that a check of the above 
in a number of broadly divergent circumstances gives satisfactory results. By 
applying the transformation 1 + Aow.ji/2 = the integral of (70) is easily 
reduced to an Incomplete Beta function whence Pearson^s tables [24] provide 
an easy means of calculating the theoretical probability that w, is within any 
given limits. 

Table I gives several observed distributions of the sums w together with their 
expected ones, calculated from (70) with the values of ho and v fitted by the 
method of moments. The last lines give particulars of the application of the 
X test for goodness of fit. 

The origin of the data used to compile Table I is as follows: 

For the data providing frequency distributions numbered 1 and 2, the author 
Is deeply indebted to Professor Raymond T. Birge. The methods of measure- 
ment and their purpose are explained in the publications [25] and [26], respec- 
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Comparison of empirical distributions of w vrith those calculated from (70) 


Number 

1 

2 

3 

4 

5 

Author or 





K. Buszczyh- 

A. A. Micheb 
son, F. G. 
Pease, and 

F. Pearson 



Source of 
Data 

R. T. Birge 

R. T. Birge 

ski and Sons, 
Ltd. 

W. S. Svenson 

Kind of Mea- 
surement or 
Analysis 

Strong Lines 
in the Band 
S^ctra of 
Nitrogen 

A Solar 
Spectrum 
Line 

Sugar Content 
of Beets 

Velocity of 
Light 

Octane 

Rating 


Frequency 

Frequency 

Frequency 

Frequency 

Frequency 

CP 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

Exp. 

Obs. 

0-1 

29-38 

29 

15-10 

17 

15-56 

16 

3-50 

2 

14-90 

17 

1-2 

19-30 

20 

13-14 

11 

12-67 

17 

7-73 

10 

18-88 

16 

2-3 

13-11 

17 

11-39 

15 

10-70 

13 

9-37 

13 

16-83 

14 

3-4 

9-16 

7 

9-84 

5 

8-98 

2 

9-66 

8 

13-93 

12 

4-5 

6-56 

6 

8-46 

9 

7-53 

11 

9-28 

17 

11-20 

10 

5-6 

4-80 

1 

7-24 

9 

6-34 

4 

8-60 

7 

8-91 

7 

6-7 

3-59 

4 

6-17 

11 

5-36 

3 

7-80 

7 

7-04 

10 

7-8 

1 4-80 

1 

5-23 

4 

4-54 

7 

6-99 

7 

5-58 

9 

8-9 

3 

4-40 

2 

3-86 

4 

6-22 

4 

4-43 

7 

9-10 

1 

2 

3-69 

2 

1 6-09 

• 4-45 

0 

5-52 

4 

3-52 

7 

10-11 

11-12 

3-94 

0 

0 

1 5-63 

2 

1 

5 

0 

4-88 

4-32 

3 

5 

1 5-08 

3 

1 

12-13 

j 

4 

1 3-76 

3 

0 

3-82 

3 

] 

0 

13-14 

[ 5-36 

1 

1 


5 

■ 6-37 

2 

4-51 

1 

14-15 


0 

1 

1 

4*61 

1 

5 

1 

0 

15-16 


0 

[ 5-95 

3 


0 

. 5-03 

1 

1 

1 

16-17 


1 


1 


0 

0 

[ 6-18 

1 

17-18 


0 


1 


3 

4-00 

3 


1 

18-19 


0 


1 

4-37 

1 

2 


2 

19-20 


1 


1 


0 


2 


0 

20-21 


1 




1 

4-55 

0 


0 

21-22 


0 




0 


1 


0 

22-23 


0 



[ 4-94 

1 


2 


0 

23-24 


0 



0 

4-23 

1 


0 

24-25 


0 




0 

1 


0 

25-26 


0 




1 


1 


0 

26-32 


2 




4 

3-94 

3 


1 

32-43 






1 

3-58 

3 


1 

>43 





{ 


3-61 

6 



Total 

o 

o 

8 

100 

100-00 

100 

K)0 bo 

ibb^ 

ibs^bo 

123 

120-99 

121 


9-63 


12-67 


18-75 


18-09 


13-35 


Degrees of 











Freedom 

7 


10 


11 


18 


10 


P(x») 

-21 


-24 


■ 066 

■ 45 


■ 21 



The symbols 1 are used to indicate the groupings used in the calculation of 
the x^. The groupings were made so as to have the expected frequency in a 
class at least equal to 3.5. 
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tively. These papers also contain various compilations of the results of the 
measurements. However, the original single measurements, necessary for the 
present paper, are naturally unpublished and Professor Birge was kind enough 
to find them for the author in his records. 

Frequency distribution No. 3 was compiled from a book of records of sugar 
beet trials carried out by Messrs. K. Buszczyfiski and Sons, Ltd. in G6rka 
Narodowa, Poland. 

The 4th distribution was constructed from the original measurements of the 
velocity of light as published [27] by Michelson, Pease, and Pearson. The 
measurements made during single days were treated as forming separate groups. 

Distribution No. 5 originated from repeated measurements of Octane Rating 
conducted by a refining company in California. They were made accessible by 
Mr. Walter S. Svenson and it is a pleasure to express the author^s deep grati- 
tude to him. 

The number of observations in each column is not very large. It may be 
expected that if it were increased, the differences between the hypothetical 
distributions and the observed ones would become more apparent. It seems 
safe, however, to assume that in a number of instances the hypothesis as to the 
character of the variability of ox is not in very bad disagreement with the actual 
facts. It would be most interesting to have some more data on the subject. 

(). The best critical region for testing H against a particular alternative. It 

seems unquestionable that the most desirable test of any hypothesis is the uni- 
formly most powerful test (U. M, P. Test) with respect to the whole class of 
simple hypotheses alternative to the one which Is being tested. Denote by H 
the hypothesis tested, by h any simple admissible hypothesis alternative to H, 
and by il the set of all A^s. If Wq is the critical region corresponding to the 
U. M. P. Test, then ivq has these properties: 

(71) (1) P\E€Wo\H]^€, 

(2) If w is any other region such that P{E ew \ H] = c then 

(72) P{Eewo\h] > P{E€w\h\, 
whatever he h € il. 

Following the known method [18], we shall see whether a test of the hypothesis 
H considered in the preceding sections exists which is a U. M. P. Test with re- 
spect to the wrhole class of admissible hypotheses that specify the probability 
laws (68) with any ho > 0 and v > 0. 

The method consists of considering one particular alternative h 3 q)othesis h\ 
that is, one particular set of values of /lo > 0 and i' > 0 and finding the best 
critical region for testing H against h*. If this region appears to depend 
on V and/or on h then there is no U. M. P. Test. The region is found by 
determining, for each system (^) of Ti , 72 , • • • , 7V+i separately, a part 
determined by the inequality 

(73) v{E\h^,v)> k{ip)p{E 1 H) 
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where A(vj) is a function of J’l , T* , • • • , Tk+i so determined that the relation 
(60) is satisfied. Substituting (6) and (68) in (73), taking the logarithm of both 
sides, and combining all terms which are constant or depend only on Ti , 72 , • • • , 
Tk+i , we have 

(74) £ log (1 + §hovn(S? + (r<+, - {<)*)) < kr(Ti , • • • , T„+i), (say). 

Clearly, for Ti , 7^2 , • • • , Ts-^i fixed, this inequality imposes a restriction on the 
variability of Wi , U 2 , • • • , Un+i while zi.i , • • • , 2 y,n -2 are allowed to vary indis- 
criminately within the extreme limits (52). But the region Wk^,y((p) determined 
by (74) also depends on the product hoP, Therefore, there is no uniformly most 
powerful test for testing H against any and all simple alternatives specifying (68). 

7. A critical region of an unbiased type. There seems to be no grounds for 
dissention that when a U. M. P. Test exists and is readily applicable, it is pref- 
erable to any other test, but the situation is quite different when there is no 
U. M. P. Test. In such cases, practical considerations may suggest a variety 
of requirements for a second best test of the hypothesis. Among these, we may 
suggest the following considerations: 

Fix, for a moment, the values of /lo , (i > • • • > f > take any region w of the 
family F(c), and consider the probability of E falling in w as a function of p 
only. This is called the power function 

(75) P{v\w) = p{E \ho,y) dxi,i • • • 

Here, of course, j' > 0, Because of the properties of regions belonging to F(€) 
we have p{0 | it?) = €. If v > 0, the value of P(p [ w) represents the corre- 
sponding probability of the test (based on w) discovering the falsehood of H. 
It is obviously desirable to have this probability as large as possible. In any 
case, it should be greater than c. This last restriction is known as that of un- 
biasedness [19], [20], [28]. Further, since it is impossible to maximize ff(p | w) 
for aU values of p, we must choose those for which it is most desirable, in our 
opinion, to concentrate our efforts to increase P(p | w). One possible point of 
view is that these values should be very close to the hypothetical value p = 0. 
For if p is considerably larger than zero, we may argue that there will be no 
need to apply any refined statistical test to detect the falsehood of H. Of 
course, this argument has no mathematical character and its general acceptance 
is not suggested. In fact, we may argue that if p is greater than zero but very 
small, it will be almost impossible to detect the falsehood of H by any test and, 
therefore, our efforts should be concentrated on values of p which are of con- 
siderable size. 

These are considerations of non-mathematical character; the role of mathe- 
matical statistics is limited to devising tests and elucidating their properties. 
If these last are understood by practical statisticians, each may choose according 



A STATISTICAL PROBLEM 


67 


to his problem. Note that what could be termed the ‘‘properties" of a test are 
summarized in the power function P(v | w) with its relation to the power func- 
tions of other possible tests of the same hypothesis. 

In this paper we shall deal with tests particularly sensitive to small deviations 
of V from its hypothetical value v = 0. In this respect, our first trial is to find 
a region Wo , belonging to the family F{e) and satisfying the condition 

(76) I y)o) l ^ | 

dv JimC “ dv Jr-O' 

where w is any other region belonging to the same family F(€). 

Because of the peculiar structure of the regions belonging to F(€), the problem 
is immediately reduced to finding regions According to theory explained 

elsewhere [18] these should satisfy the condition 

(77) ’ - H >HT)p{E\H), 

OP 


where k{T) depends on 7'i , Ti , ■ ■ • , Tk+i only and is determined to satisfy the 
condition of similarity (60). Condition (77) is equivalent to 


(78) 


d log p{E I lio , v) ~l 
dv 


> HT). 


Taking the logaritlim of (68), differentiating with respect to v, putting v equal 
to zero, substituting in (78), and combining all the terms which are constant 
on Witp) into a single term which we may write as we have 

(79) Z {S\ + (r<+i - {.)*)* > h(D. 

t-l 


We note that condition (79) determining, so to speak, the shape of the region 
Wo(<p) does not imply any restriction on the variability of the z^s but only on 
the w^s. However, the region wq{(p) as determined by (79) has the disadvantage 
of being dependent on the values of the . Since these are not specified by 
the hypothesis tested, we are not able to determine the critical regions belonging 
to the family Fit) and maximizing the derivative dfi{v | w)/dv]p^ . The region 
which does so for some particular system , { 2 , • • • , f at of values of the f^s 
will lose this property if the system of values of the f ^s is appropriately changed. 
Therefore, our choice of the region maximizing the derivative of the power func- 
tion at I' = 0 should be made not from the whole family Fie) but from a sub- 
family Fi(€) composed only of such regions which also possess the supplementary 
property that 

(80) 1 = constant 

dv JpmJd 

has a value independent of , &,•••, (jv • The determination of this sub- 
family Fiie) embracing all such regions is an interesting problem. Until it is 
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solved, we use an obvious subfamily F^it) of regions w which have the desired 
property, but we do not know whether or not contains all such regions.^ 

The family F2(€) is defined as consisting of those regions belonging to F{€) 
which could be described as cylindrical with their generators parallel to the 
intersection of ir,+i = x,. = constant, for f = 1, 2, • • • , A^. In other words 
and more precisely, a region w of the family F{€) belongs to F2(«) only if the 
question of its including a given point E depends on Nn — N of its coordinates, 
namely on Ti , Wi , • • . , Uy-i , Zui , • • • , Zjv.n -2 and not on 7^2 , 7 3 , • • • , 7V+i . 

We easily show that any region w belonging to F2{€) possesses the property 
that its power function is independent of the {/s. Denote by the set of sys- 
tems of values of 7\ , , • • * , u^-i , 21.1 , • • • , 2\r,n-2 corresponding to points 

included in any given region w of the family F2{€), We see that the power 
function ff(v \ w)^ equal to the integral of (68) over w, can be calculated by using 
the transformations (47), (49), and (51). Then the region of integration for 
T\ yUi , • • • , Un-\ , 2i,i , • • • , ZN,n-2 is what we have just denoted by w' and the 
integrations for = Xi. extend from — 00 to -f- qo irrespective of the fixed 
values of the other variables. These integrations are easily carried out by sub- 
stituting 

(81) \nh(iv{Xi. — iif = (1 + \nhiivS\)t\ . 

The final result is 

(82) ^{v I w) = P(7\ , , • • • Us^\ , «1,1 , • • • 2A'.n-2) dTl • • • dZs,n --2 

Here 

p(ri ,1^1, • • • Un^\ , Zi,\ , • • • 2a .n- 2 ) 

('83') / N 

= cip)HTx,u, z) /ll (1 + 

where c{v) denotes a constant depending on v, , m, z) denotes a function of 
all the N{n — 1) variables involved, independent of v, and S? denotes expressions 
(47) for short. We see that (82) is independent of the fi\s. 

Since the region w belongs to /'X^), it is composed of sections selected 
separatel}^ on each hypersurface Ti = constant and 7^4.1 == co!istant, z = 1, 
2, • • • f N, Because of the definition of the family F2(c), the sections w((p) are 
independent of T2 , Ta , • • . , Ta.^i so that each of them can be selected only in 
accordance with the value of Ti . Therefore, we may denote them by w{ 7 \). 
As far as property (80) is concerned, the choice is arbitrary. But the property 
of similarity requires the fulfillment of condition (60) which, in the present case, 
reduces to 


(84) / ••• / p{ui,---ux^i,zi,i 9 • • • Zii,n-^2 I Ti , • • • 7 V-}-i) du] 

J •'w(ri) 


dz. 


N ,n-2 


* Regions with the property (80) and belonging to F{t) but not to F 2 {t) exist. Probably 
however, each of them differs from one of the regions of 7’2(c) by a set of measure zero only. 
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Applying the method already used, we i&nd that sections ©(Ti) of the region IZ) 
belonging to F2(€) and maximizing the derivative | w)/dv]p^ are determined, 
separately for each value of Ti , by the inequality 

(85) d log p(Ti , Ui , ... Uif^i , Zi,i , » ^ . Zjv.n~ l) 



where hiTi) denotes a function of Ti determined to satisfy (84). 
Substituting (83) in (85) we easily find that this condition is equivalent to 


( 86 ) 



N-l \2 

•-1 / 


where, again, ^3(^1) is determined for each particular value of Ti to satisfy (84). 
As (86) does not imply any restrictions on the variability of Zi,i , 21,2 , • • • , Zy.n^ 2 , 
the integrations for the z^s while calculating (84) must be carried out over the 
extreme limits (52). This will reduce the integrand to the relative probability 
law of Ui f th ,••• f Wjv-i given all the This law is easily calculated from 
(58) and is 


(87) 


p(ui f U 2 , • • • Uy^i \Tij J • • • 

= p(ui , W2 , • • • Wat-i) 


|(n~8) 


As (87) is independent of Ti , 7^2 , • • • , Tj^+i , it is also the absolute probability 
law of the u^h and hence h(Ti) is independent of Ti . In accordance with the 
notation adopted for the left side of (86), namely f, and since the choice of 
kz{Ti) depends on 6 , n, and A, we may use f, instead of h{Ti), Then the region 
w is determined by the inequality 

N-l / JV-l \2 

(88) f = g + (1 - E > r. 

or, returning to the original variables, by the inequality 

( 89 ) f ^i:sj/(|:-sjJ>r. 


where is the root of the equation 


far. 


This region w has the following property: of all the regions belonging to the 
family Ft(e), the derivative of the power function of © at the point r = 0 is the 
greatest. Thus, as far as the values of v close to zero are concerned, we may 
say that, for testing H,wis the most powerful critical region in the family Fjfe). 
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8. Methods of determining • To calculate accurately we must calculate 
the integral probability law of f, that is to say, 

(91) p{r < «} = /•••/ p(ui , • • • i) dui • • • duii—i 

t<* 

for any z. The author was not able to achieve this. Therefore some methods 
of approximation had to be looked for. This task becomes somewhat simplified 
by noting that in most practical problems N will be very large, in the hundreds 
or thousands, while n will probably not exceed 5. 

To start, we notice that the range of f is limited by 

(92) 1/iV < r < 1. 

The easiest way to see this is to look for maxima and minima of the sum 

(93) X = St 

t-l 

subject to the restriction that 

(94) t. St = 

We then easily find that 

(95) Tl/N <X <Tl 


and (92) follows directly. 

Since f is a polynomial of the second order in the m’s, we may consider its 

JV 

moments. These will be functions of the expectations of the products JJ wj* 

t-l 

JV-l 

where, for short, Wat = 1 — 2] Wt . Using (87) we easily find that 


In particular, if we let (n — l)/2 = a 


E{u\) = 


a(a + 1) 
Na{Na + 1) 


p/,/\ _ + l)(o + 2)(o + 3) 

Na{Na + l)(Ara + 2)(iVa + 3) 


E{utut) = 


o*(o -I" 1)* 

Na{Na + l)(JVo + 2)(i\ra + 3) ' 


(99) 
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Consequently and because f ^ , we have 

(100) Ed) - Mi “ (o + D/iNa + 1) 


( 101 ) 


£^(r*) = Mi * Z Eiut) + 2 Z z EiuU*) 




(o + l)(o + 2)(o + 3) 


(N - l)o(o + D* 


(Na + l)(JNro + 2)(JVo + 3) ^ (Na + IKNa + 2){Na + 3) ' 

The variance a* of f is therefore 

/inoN * _ 2o(o + l)iN - 1) 

^ ^ {Na + l)*(JVa + 2)(Na + 3) ’ 

By a similar procedure we find that 

(o + l)(a + 2)(o + 3)(o + 4)(a + 5) 

+ 3(N - l)o(o + l)*(o + 2)(o + 3) 

+ (N- IKN - 2)a\a + 1)* 


(103) E(f) = Mi 


(Na + l)(JVo + 2)(JVo + 3)(;Va + 4)(iVa + 6) 
n (o + i) + 4(i\r — l)c(o + 1) n (a + i) 


j-i 


(104) E(i*) = Mi = 


+ 3(JV- Don («+;•)* 

+ 6(N - 1)(N - 2)a\a + l)‘(o + 2)(o + 3) 
+ (jy - 1)(N - 2)(iy - 3)o*(a + D* 

n (Na + j) 

1-1 


One possible method of approximating f, is to use the formulae above, together 
with the higher moments whose formulae are easy to deduce. Some convenient 
known distribution, say po(f), could be fitted to have its first two or three mo- 
ments coincide with those of the unknown true distribution of f. We would 
then look for better approximations by means of the functions 


m 

(105) P*(f) = Po(f) Z 

where the ir/s denote polynomials which are orthogonal and normal with respect 
to po(f) so that 

f [ 1 if j = fc 

(106) / iry*-kPtt(f)df < . , 

(0 if j 

The constant coefficients A ,■ are formed to minimize the integral 

(107) / (p(f) - po(r) Z p;r‘(r) dr. 

They are expressible in terms of the known moments of p(f). 
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This is one possible way to approximate p(0 which would eventually lead to 
the computation of f g even for small values of N. 

Remembering that we are concerned with large iV's, we can prove that the 
normalized distribution of f, that is, the distribution of 

(108) 

tends to be normal as iV — > « . However, the process of tending to the limit 
is rather slow as may be seen from the following table of K. Pearson’s Pi and ft . 


TABLE II 

Frequency constants of the distribution of f 


n 

N 

Ml 



fit 

3 

100 

.0198 

.001922 

.8652 

5.042 

3 

200 

.0099 

.000693 

.4618 

4.244 

3 

400 

.0050 

.000248 

.2410 

3.587 


Because of this and also because the proof that the distribution of (108) tends 
to normality is not very straightforward, we shall not reproduce it. But it may 
be well to point out that the cause of this slowness in tending to the limit lies 
in the skewness of the distribution of each particular ui and in the mutual 
dependency of all the w/s. 

The most promismg method seems to be the following. First consider the 
two sums 

(109) Ti = Z S? and To = £ Sl 

i-1 

Obviously, these two sums satisfy the conditions of the limiting theorem of 
S. Bernstein [29], [30] and, therefore, as A — ► oo, their joint normalized distri- 
bution tends to a normal surface. Also, we may expect the process of tending 
to the limit to be rapid in this case. If p{To , 7\) denotes the limiting normal 
distribution, the probability that f > z can be approximately calculated by the 
integral 

(110) P{r >z} =P{T,> zT\\ = r* dT, r p(To, Ti)dTo. 

JLop 

To calculate the limiting distribution p(7'o , 7i) we need only the expectations, 
say A and B, of Ti and T.\ respectively, their standard errors, say ci and <r 2 , 
and their correlation coefficient 72. These may be obtained from the moments 
of the B?’s. 

Formula (110) can be used not only for tabulating the integral probability 
law of f and for determining f* , but also for an approximate calculation of the 
power function of the test. For, if the limiting probability law p{Tq , Ti) is 
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calculated using the moments of calculated from (70) with some v > 0, then 
the integral (110) calculated with 2 = f, gives us the probability > f* I J*} 
of the test detecting the falsehood of the hypothesis tested, that is, the power 
function. 

To save space, we shall now calculate the constants A, B, ai , a , and R as 
functions of »> > 0. The values appropriate to the case when the hypothesis 
tested is true will then be obtained from the general formulae by the mere 
substitution of v = 0. 

Since all the constants above depend on the expectations of ST, we use formula 
(70) to calculate them. Denoting the exj)ectation of <S** by m* , we have 


( 111 ) 


2(nAo«'/2)“"'“ r 5“^"^ 

B(l/v, J(n - 1)) X (1 + 


Introducing the new variable 


( 112 ) 


1 + = r* 


makes the integration straightforward and gives 

^ Vr((lA ) - k)mn -l) + k) 
^ \nhov) r(iA)r(i(n - D) 


This formula holds good if 1/v > k. Otherwise the A;th moment ^lk is divergent. 
So this approximate method of calculating the power function of the test is 
applicable only for i/ < .25. 

Substituting k = 1, 2, 3, 4 in (113), we have 


(114) 


1 n -J 
nho \ — V 


M2 = 


M8 = 


M4 = 


fXY -. y - L_ 

VnV (1 - v)(l - 2v) 

(} Y (»' - 1)( ” + 3) 

\nV (1 - i')(l - 2v)(l - Zv) 

/ 1 Y (»* - 1)(« + 3)(n + 6) 
VnV (1 - i')(l - 2p)il - 3i')(l - 4i-)’ 


and now we have 


(115) 

(116) 


.iSTn-l N n*-l 

riAol-v’ (who)* (1 -*’)(! -2v)’ 

* _ N in - 1)(2 + vin - 3)) 

(n/io)* (1 - »')*(1 - 2y) ’ 

, _ 2JV (n* - 1)(2 + fin - 3))(2(n + 2) - i»(5n + 7)) 
inhoY (1 - •')*(! - 2i')»(l - 3i')(l - 4r) 


(117) 


I 
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Inspecting formulae (115) to (118) makes us see that there is an advantage 
in substituting two new variables 


(119) 


nho 


N{n - 1) 


Ti. 


(nho)* 
N(n* - 1) 


To, 


for Ti and To . Their expectations, say and ^ , are 


( 120 ) 



1>2 


1 

(1 - »')(1 - 2 y )' 


Probably without any danger of confusion, the S.E.’s of <i and h may be de- 
noted by V] and vt also and we shall have 


» _ 2 -|- i»(n — 3) 

N(n - 1)(1 - .)*(! - 2i.)' 

j _ 2(2 4- p(n - 3))(2(n -|- 2) - r(5n + 7)) 

JV(n* - 1)(1 - 2k)*(1 - 3v)(l - 4k) ' 

Of course, the correlation coefficient of t, and k is the same as that of Ti and To , 
namely R. Obviously, the inequality To > zT\ is equivalent to <* > Zil\ pro- 
vided that 


( 122 ) z ^ Zi — — — — , 

^ ’ ‘jV(n-l) 

Now the problem of calculating (110) is reduced to finding 

P\^ > «} = Plto > zA\ 

1 ^ fa 

(123) 2ir<ri<rj\/l - «* ^ L 2(1 - /P) \ vf 

- 2R (<. - 

Clffi Ti JJ 

We may conveniently see the workings of the test proposed by considering for- 
mula (123). First consider the case when the hypothesis tested is true. Both 
and reduce to unity. The region of highest frequency is aroimd the point 
<1 = = 1. If iV is large then both <ri and cr^ are small so that the region of 

significant frequency is rather small. The integral (123) is to be taken over 
the region above the parabola ti =* Zitl passing through the origin of coordinates. 
When Zi is small and the parabola passes far below the point == <2 = 1, the 
probability P{f > 2 } will be close to unity. When 2 i = 1 this probability will 
be less than i and it will diminish rapidly with further increases of Zi . Now 
suppose that we have found the value f, for which = and 

consider what will happen to (123) when 2 «= f* if is increased. Clearly, neither 
of <ri and <ri nor R are very sensitive to slight changes in p. Also $i will not 
change very much. On the other hand, ^ will increase rather fast. The final 
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conclusion is that the whole frequracy surface corresponding to the integrand in 
(123) will not change shape much but will shift to bring a greater amount of 
frequency into the region of integration. 

To facilitate niunerical calculations introduce 


(124) 


<1 - th 

n ’ 


Now (123) may be rewritten as 


(126) 

where 

(126) 




+00 


fa ~ ~ Rfftih — lyQ/ffi 

ffaV^l — jB* 

I v2ir ) 


y(x, Zi) 


y/2r ' 

zi(^i + ffix)* — dj — Rfftx 


Using formulae (125), (126) and (119) to (122), the following numerical 
values were obtained. 

TABLE III 

n = 3, N - 100, r =» 0. 



p{r^*|r-o) 

.8 

.9126 

.9 

.7306 

1.0 

.4905 

1.1 

.2847 

1.2 

.1496 

1.3 

.0730 

1.4 

.0336 

1.5 

.0148 

1.6 

.00644 

1.7 

.00288 

1.34450 

.06000 

1.54563 

.01000 


TABLE IV 

Power of the teat for n = 3 and N = 100. 


€ 


- .01 

V - .16 

.06 

.02689 

.05823 

.37482 

.01 

.03091 

.01234 

.10699 


The figures above are only approximate and we realize that the greater the 
value of V the less satisfactory is the approximation of the power function. A 
check of the goodness of the approximation and, if it proves satisfactory, a few 
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numerical tables for practical applications of the test must be postponed to 
another publication. 

It is a pleasure to record the author^s indebtedness to Miss Elizabeth Scott 
and also to Miss Julia Bowman for carrying out all the numerical work con- 
nected with the present paper. 
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A CONCISE ANALYSIS OF CERTAIN ALGEBRAIC FORMS 

By Franklin E. Satterthwaite 

Stale University of loioa, Iowa City, Iowa 

Many of the statistics in common use are functions of homogeneous algebraic 
forms in the items of the sample. Among such statistics are the mean^ a linear 
form; the variance, a quadratic form; and the product moment, a bilinear form. 
With the extension of the science, the mathematical statistician is faced with 
the study of more complex statistics and the associated algebraic forms and 
matrices. The purpose of this paper is to set forth concise and efficient nota- 
tions and methods which may be used in such analysis. 

We shall borrow the essential features of our notation from differential geom- 
etry and tensor analysis. The Kronekcr delta is defined as, 

6{ = 1, f = j, 

= 0, i j. 

The summation convention provides that summation be performed with respect 
to any index appearing twice in the same term. Thus, 

Xiy" = xiy^ + xty^ + • • • . 

To extend the use of the summation convention, we shall frequently place 
indices on the numeral, 1. Thus, 

VXi = l^Xi + 1^X2 + • • • = Xi + + • • • . 

Symmetry in the calculations is more striking if the pair of summation indices 
appears, one as a superscript, the other as a subscript. Therefore we allow the 
shifting of an index from the one position to the other at will. Thus, 

X»' s x\ 

Where no confusion will arise, indices may be placed outside of parentheses. 



The standard notations for averages will be used. 
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Unless otherwise indicated, the symbol, 2, will always stand for summation 
over aU unrepeated indices including any already averaged under conventions 
(1) and (2). Thus, 

= JVi*. 


The following simple formulas are fundamental to the arithmetic of this 
notation. They are obvious upon the expansion of the summations. Each 
index varies from 1 to a. These formulas are 

S{xi - Xi, 

— Jf* 

oi Oj ^ Oi f 

{ji* = i<, 

lily = al?, 

= a, 



The symbols of this notation obey the associative, commutative, and the 
distributive laws of simple arithmetic so that the operations of summation, 
multiplication, and squaring are very easy. Thus for the product of two linear 
forms we have 



The sum of squares is obtained by the simple repetition of the form. 


(3) 


2a;J = = («jx,)(«Jx,), 

= (dixi)(dix*) = Slx,x\ 


Two other sums of squares occur so frequently that they should be particularly 
noted: 


( 4 ) 
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(5) 


X(x{ — i)* == 2 


[(‘-0:4 

= (m — “{ — 6- + 

\ a a aa/ik 


The striking similarity in the coefficients of the second and final expressions for 
the summations in (3), (4), and (5) should not be overlooked. 

Where we have multiple classification of the variables, we may operate on 
each index separately. For example, in a four-way anal3rsis of variance we may 
have the quadratic fonn, 


Q = 2{f<,i. — f«.. — ii.k. -f 



The rank is one of the important properties of a quadratic form or matrix. 
An experienced mathematician usually has a rule of thtunb for determining the 
ranks of those quadratic forms occurring in statistical analysis. In order to 
formulate such rules of thumb into a simple and rigorous algebra, the author 
here defines a type of matrix multiplication which he calls “uncontracted matrix 
multiplication” and which he represents by the symbol, O. 

Let A = II II and B = || II be two matrices of any finite orders and with 
ranks Ra and Rb . We define the uncontracted product, A O as follows: 


C = A05 

= ii«nioB 

= llaj^li 

oiB aiB •• 
ot\B ot\B • • 
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where 


aiA 


otifil aifil 
at a< 0I 


Thus the elements of C are 

We therefore see that whenever we have a matrix whose elements can be 
factored in the above manner, then the matrix can be expressed as the uncon- 
tracted product of simple matrices. Thus, 

then li7Z’:::'ll = ll«?lioili8“l|o---. 

We shall now prove that the rank of the uncontracted product, C = -4 O B, 
of two matrices is equal to the product of the ranks. This follows because for 
the matrix, A, there always exists a set of elementary transformations defined 
by the equations. 


T^: 




di j^o, i = j, 


where the i = j, are coefficients providing for the multiplication of the ele- 
ments of a row by a constant not zero; the i 7 ^ jf are coefficients providing 
for the addition to the elements of a row a linear function of the corresponding 
elements of the other rows; the d’s are similar coefficients referring to columns; 

the symbol is an operator indicating the interchange of the ith and jth 

rows (columns) ; and the a5^s have the values, 

aBI = 1, i ^ j < Ra, 

= 0, otherwise. 

This set of transformations reduces A to a diagonal matrix with Ra non-zero 
elements. A similar set of transformations, 


Tb • bBI = 


exists for the matrix B. 
equations, 

T'a: 


1 

^8/ \kj 

We next define two sets of transformations by the 

Ui X) = 
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which are also elementary because of their relationship to Ta and T» . Now 
if we subject the matrix, C => || (a^/St) || to the transformations T'a followed by 
the transformations T'b , it will be reduced to the diagonal form C || (^sisth) || 
with exactly Ba-Rb non-zero elements. Therefore, since the rank of a matrix is 
invariant under elementary transformations, the rank of C A O B must 
be RaRb • 

We shall now determine the ranks of several matrices which occur frequently 
in statistics: 


4, = II 1,11 = 111,1,1, ...II, fti. 1. 
A, = 11 15 11 = 111.- 1^1 = II 1.11 OH I'll, 
Rt = 1.1 = 1. 

= ii«iii, 


The proof that 724 = a — 1 involves two steps. First summing the rows of Aa 
we have, 



so that Ra < a — 1 . Second if we subtract the elements of the first row from 
the corresponding elements of each of the other rows we obtain, 


Aa 




\i = 1 

lit- 1. 


Since the (a — l)st order determinant in the lower right-hand comer is not 
equal to zero, J ?4 > u — 1. 

Applying our theorem on uncontracted products, the ranks of complicated 
matrices can often be determined by inspection. Thus: 


Ri = a«(6 — 1). 

= (o - 1)(6 - 1). 

"■■'11 = i [(* - - 0 /'] 


= M = 1. 
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The Matrix -47 may be confusing at first sight. Note that each element, ai , 
is a quadratic form in the This form is of rank 1 and can be factored into 
two linear factors, one independent of the other independent of t. 

To illustrate the application of these techniques to a fairly complicated prob- 
lem, we shall construct and verify a design for the analysis of variance involving 
a regression line. It is known that sufficient conditions for such a design to 
be valid are: 

1. The sum of the quadratic forms be equal to the sum of the squares of the 
variables, and 

2. The sum of the ranks of the forms be equal to the number of variables. 
We shall use the first condition to set up our design. Thus, 


( 6 ) 


2x!,- = 



+ 

+ 





XklX 


Rewriting this in the usual notation, we have for our tenative design, 


( 7 ) 


2x*,- = Zlxij - £i. - £.,■ + il* + 2ff]* + 2[«.,- - f]* 

+ 2[(r<r,/i7j(y< — y)Y + 2[(ij. — f ) — (r(7,/(rj(y< — y)]*. 


In order to determine the corresponding equation for the ranks, we rewrite (6) 
in the form, 

- {(' - - 0! + OM + Ol‘ - 0,' 

First we must determine the rank of the unfamiliar matrix, 


We see that the rank of -48 cannot be greater than a — 2 because two linear 
relations exist between the rows, namely, 


l*ai = 0, since 
y*ai = 0, since 



aa 


2 

V* 
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To show that the rank of At cannot be less than a — 2, we subtract the elements 
of the first row from the corresponding elements of each of the last a — 2 rows, 
giving, 


7 — Tv? 

“‘I (a - M !/*(«{- 

A- . <»/• ^ >V1,2 

cur* CUii 

Multiplying each element of the second column by — ^a — y* ^ ^a — y* 


and adding the result to the corresponding element of the jth colunm forj ^ 3, 
4, • • • a, we see that the (a — 2)th order determinant in the lower right-hand 
comer becomes | a|- 1 which is not equal to aero. Therefore the rank of At must 
equal a — 2. 

Beferring to equation (8), we now write down the corresponding equation for 
ranks using the theorem on uncontracted products. Thus, 


2 Ranks - (a - 1)(6 - 1) -h (1)(1) + (1)(6 - 1) + (1)(1)(1) + (a - 2)(1), 


s* ab. 


Hence the quadratic forms in the right member of equation (7) are mutually 
independent and each, measured in units of the variance of the population, is 
distributed as is Chi-square with the appropriate number of degrees of freedom. 



A SYMMETRIC METHOD OF OBTAINING UNBUSED ESTIMATES 
AND EXPECTED VALUES 

By Paul L. Dressel 

Michigan State College, East Lansing, Michigan 

The problem of finding the relationship between moment functions of a 
sample and moment functions of the population from which the sample was 
obtained has, of necessity, received much attention. The problem has two 
parts: first, to find the expected value of a given sample moment function; 
second, to find the estimate of a given population moment function. Thus, if 
nii represent the ith central moment of a sample and fn represent the ith central 
moment of the population, the first part of the problem requires that we find 
the mean value of for all possible samples of a given size and express it in 
term of the m/s. The second part requires that we find a function of the m/s 
such that the mean value, taken for all possible samples of a given size, be a 
given fit . For the case t = 4 we have the well known results: 

1 - !)(«' - 3n + 3) , 3(n - l)(2n - 3) , 

mj, 

51— If ■\ n {n 2w "t" 3) 3w (2w 3) j 

£rM= mt 

These results are based on the assumption of an infinite population. In spite 
of the inverse relationship existing between estimates and expected value, the 
expressions above show no simple relationship. This lack of simplicity of rela- 
tionship between estimate and expected value is directly traceable to the fact 
that such results are usually obtained for infinite populations. When results 
are obtained for finite populations a symmetry is found to exist which reduces 
to a single problem the two parts stated above. Sii]ce this should be evident 
to anyone upon reflection, the main purpose of the present, paper may be con- 
sidered as that of indicating one method of demonstrating the result stated 
above as well as showing relationship of this method to material appearing in 
previously published papers. 

Consider a finite population consisting of N items xi • • • and samples of 7i 
items taken from that population, the sampling being done without replacement. 
We shall utilize the power product notation of P. S. Dwyer [1; p. 13] 

(qi’--qr)= 2 • • • 4 ; 
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to represent a power product formed for the sample and 


( 2 ) 


[gi...gj= 2 xViXi\->-xVr 

to represent like power products formed for the population. An arbitral 
moment function of weight r of the sample is indicated by 

r! 

(3) f * ••• (ji) 


(g.!)’ 


(gi!)*^‘iri! ••• *•«! 


and likewise a moment function of the population is indicated by 

r! 


(4) 


2A, 


* •••*' (giir ••• (g,l)"iril...T.l 


Igi]'* ••• 


where the summation extends over all partitions of r. 

It now is convenient to express each of the expressions (3) and (4) in terms 
of power products. We shall utilize for this purpose an expansion theorem 
which is the converse of a theorem stated by Dwyer, [1 ; p. 34] and [2; pp. 37-39], 
which can be proved in a similar fashion. 

This converse theorem follows: 

If any isobaric sum of products of power sums indicated by 


(5) 


r! 




[g.]* 


be expanded in terms of power products in a form indicated by 

r! 


( 6 ) 




**1 


• (Pi!)'‘ • • • (p.!)'Vi! . . . x.I 

then the coefficient 5, of the power sum [r] is given by 


(7) 


Br = 2 


r! 


and the coefficient of [rir 2 • • • rj is 

(8) = Br^Br^ ••• Br^ 


where the barred product indicates a symbolic multiplication by suffixing of sub- 
scripts. 

This is exemplified by 


Bs2 = BzBo = (Ab + SA 2 I + -4iu)(A 2 + All) 


== As 2 + Asii + 3A221 + 4A2m + Aiiiii. 

Using this theorem the moment functions (3) and (4) are easily expanded in 
terms of power products. In this latter form the expected value of the sample 
moment function is easily found by utilizing the fact that 




I — 1^1 • • 


9.] 
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Now if the expected value of the sample moment function be equated to the 
population moment function (both being in power product form) we obtain a 
set of equations connecting the coefficients of a sample moment function and a 
population moment fimction. Since either the coefficients of the sample mo- 
ment function or those of the population moment function may be assigned 
and the others solved for, this set of equations enables one to solve two problems. 
First, we may find unbiased estimates — moment functions of the sample such 
that their expected value is some preassigned population moment function. 
Second, we may find expected values — moment functions of the population such 
that they are expected values of some preassigned sample moment function. 
From the symmetry of this set of equations, we shall see that any result ob- 
tained from the system has, through the 83rmmetry, a dual role. 

The foregoing discussion may be clarified by an example. Let At[2] + .du[l]* 
be the population moment function. In terms of power products this becomes 
(At 4* d.ii)[2] + ilii[ll]. The sample moment function ai(2) + au(l)' becomes 
in terms of power products (at + au)(2) -f aii(ll) and its expected value is 

(S) 

^ (a* + aii)[2] + au[ll]. 

By equating this to the population moment function above we obtain 

n Oil = N All , 

n(at + Oil) = N(At + du), 

and the symmetry of the system is apparent. 

If 

« Ar«)' nW Pi' 

the solutions of the system are 

Oil = T%Aii , All = pjOii , 

(9) 

Of = TiAt + (ti — r2)i4.ii , A 2 = PiOt + (pi Pt)o>ii . 

In a similar manner if we use moment functions of weight 3 we begin with 

A, [3] + 3A,i(2][l] -I- Aiii[l]*, 

®»(3) + 3oii(2)(l) -1- Oiii(l)*, 
and obtain the system of equations 

w^*^aiii =» N^^^Aiii 
+ Om) = N^^^{A%i + iliii) 

w(a» + 3021 + Oiii) — N{A% + 3A21 + Am) 
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with solutions 


(10) 


•4iu 

Ati 

A» 


PlUui t 

PlOll + (p» — Pl)OlU I 

piOt + 3 (pi — p^Oti + (pi ” 3 p* + 2 pi)aiu . 


The solutions for the a’s in terms of the il’s are obtainable from the giv«i results 
in an obvious manner. 

If we use the Carver fimctions [3; p. 104] 


^*1 “ Pi > 

ft = Pi ~ PI > 

ft = Pi ~ 3pi + 2pi , 

P4 = Pi — 7 pi + 12 pt — 6 p 4 , 


Pu — Pi 
P ii Pi “* P» 

Pn = Pi ~ 2 pt + p« 


Pi< = Pii 


or in general 

(11) ft = i:p4E(-i)'"‘ 

and 




(piiy 


(p,!)'*iril •••!•.! 


PtlTf-r^ — PriPrt *•* Pf^ 

where the double barred product indicates a symbolic multiplication by addi- 
tion of subscripts exemplified by 

Ptt = PzPi = (pi — 3 pi + 2 p 8 )(pi Pi) 

=5 p* — 4 p 8 + 5 p 4 — 2 pi ; 

the results (9) and (10) may be written 

All = Piiaii , -4| = PiUi + SPsOfi + Psttm i 

ris == PiOi + Piflu > rill = PiiOfi + Piittm I 

rim == P inOm • 

Similarly for weight 4 we obtain 


ri4 = 

fto4 + 4Pia»i 

+ SPiOm + fiPsOfii + P^im , 

rill « 

PllOll 

+ SPfiOiu + Piittuii 9 

rin ao 


Plight + 2Piiaiu + Pnauii i 

rilu = 


PiiiOiu + Piu<»mi 1 

riuu * 


PimOiiu 
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In general 
(12) Ar == 


(Piir(P2!y 


(p.!)''iril 




(13) • • *>’m — AfiAr^ ••• Ar^j 

where as before the barred product indicates a symbolic multiplication by 
suflSxing of subscripts. 

If in 


... (g,!)'*iri!...ir*!' 




(-1)''^'*^ + ^, + . . . + X, - 1)1 




the moment function of the sample which is thereby represented is the Thiele 
semin variant U of the sample. If the ^I's are solved for by means of the appro- 
priate set of equations the expected value of U is found. Thus we find 


Nw 

(2)^ 


Mi|] = 


N*n^*^ 


- X* - in - N)inN - n - N - Dkj, 


»i - ^ (» - «w» - 1 »«. 

iV'n"’ JV’n® 

(n - miNn -n-N- S)k,, 
where the k system of seminvariants used here is defined by 
K2r = S S ( — l)’(^*^)A‘<M2r-i, 

k2,+i - L . (-1) ^ J 


By virtue of the symmetry noted earlier it follows that the estimates of the 
Thiele seminvariants and products of these seminvariants of weight < 5 are 
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obtainable from the last results by replacing E by ET^ (estimate of), Ki by ki 
U by \i , and N by n. In this manner we find that La , the estimate of X 4 is 

(18) i. = E-M - ^ W - “XJV" - 


It is of some interest to note in the results (16) above that in those expected 
values or estimates which contain more than one term the factor iNT — n occurs 
in the second term. This, and the form of other coefficients involved in the 
terms, shows that as the sample size approaches the population size the sample 
seminvariants approach the population seminvariants. Another characteristic 
of such results as those given in (16) is that infinite sampling formulas are easily 
obtainable therefrom. Thus U in Li given in (18) JV* — ^ we find 


Li 


n , I n I 

~7d) ^ ' u) 

^(4) ^(4) 


n*(n + 1 ) 

“liW 


nu 


3n\n - 1) 8 
nia, 


n' 


( 4 ) 


the first of these forms checking the result given by Dressel [4; p. 45] and the 
second form being identical with that given by Fisher [5]. 

The results exhibited above for finite sampling may lead to a mistaken idea 
about the simplicity of the results. Simplicity decreases rapidly as the weight 
increases. Thus for weight 6 we find 


m] = 




+ 


2Ar‘n‘'' 

■>/<«) n« 


(n - AOCATn - 20)[8 m6 - 15m4m* + IOmI - 46m8] 


(19) 


+ (n - N)lNn(n + N) - 12nN + 60] 

• + 105/MM2 — 50 M* + 60ms] 

( 2 ) 

(n - N)lNn(N^ + nN + n’) - 14nN(N + n) + 71Nn - 120] 

A'wn* 

^ 10JVn“' 2n"> , 

^ ^(4>jjS A^)(Ar + w 5) jy«)8i6 N)jKt. 


Again by letting JV — > oo infinite sampling results are obtained. Much of this 
last result vanishes in that case. 

It has been demonstrated that the k system of seminvariants are invariant 
underestimation in the case of infinite sampling [4; p. 53]. It is therefore of 
some interest to note that this system also possesses the property for finite 
sampling without replacement. The proof of this is quite simple. Denote the 
estimate of k,- by Ki and the fundamental relations are 


K^r = 
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These expressions hold for any n and hence for a population of N. Let Ku and 
Kir+i denote functions corresponding to Ktr and but with population 
moments replacing sample moments and we have 


K' ^ 

Jf(2) 


Ki 


»r4l 


AT* 


Since the power product mode of formulation of Ku and JCir+i insures that 

E[Ku] = Ki , E[Kir+l] = Kir+l 

it follows that 


SIK.,1 = e[^ 


' _ fi" 


or 


E[kir] 


n*iV<« 


Similarly 

thus establishing the theorem stated above. 
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DETERMINATION OF SAMPLE SIZES FOR SETTINO 
TOLERANCE LIMITS 

By S. S. Wilks 

Princeton University, Princeton, N. J. 

1. Introduction. In the mass production of a given product or apparatus 
piece-part, Shewhart^ has discussed a practical procedure for detecting the exist- 
ence of assignable causes of variation in a given quality characteristic of the 
product as measured by a variable x. For example, x may be the thickness in 
inches of a washer or the tensile strength in pounds of a small aluminum casting 
made according to a given set of specifications; x varies in value from washer 
to washer or from casting to casting. Now suppose assignable causes of vari- 
ability in X have been detected by Shewhart’s procedure and have been suffi- 
ciently well eliminated by making appropriate refinements in the manufacturing 
process so that for all practical purposes the remaining variability may be con- 
sidered ^'random, thus allowing us to assume that we have a statistical universe 
U in which x is a random variable with some distribution law f{x), f{x) is, in 
general, unknown and cannot be determined until long after the refined manu- 
facturing operation has been under way. Two types of situations arise in prac- 
tice, one in which x is a discrete variable taking on only certain isolated values 
as for example 1, 2, 3, • • • , etc. with corresponding probabilities p(l), p(2), • • • , 
the other being that in which x is essentially a continuous variable over some 
range with a corresponding probability density function /(x). In this paper we 
shall consider the latter type of variable. 

The problem now arises as to how we should calculate a tolerance range 
(Li , L 2 ) for X from a sample, and how large the sample should be in order for 
the tolerance range to have a given degree of stability. More specifically, for a 
given method of calculating tolerance limits, how large should our sample be in order 
that the proportion P of the universe included bettveen Li and Lt have an average 
value a, and will be such that the probability is at least p that P will lie between 
two given numbers, say b and c? For example, if a tolerance range is obtained 
by using a truncated sample range, that is by letting Li be the greatest of the r 
smallest values in a sample and L 2 the smallest of the r largest values, r being 
chosen so that E(P) = .99, how large should the sample size, say n, be in order 
for the probability to be .9 that P would lie between .985 and .995? A similar 
question can be asked when the setting of only one tolerance limit is under 
consideration. 

^ W. A. Shewhart, Economic Control of Quality of Manufactured Product, D. Van Nos- 
trand Company, New York, 1931. 
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2. Tolerance ranges from truncated sample ranges. Suppose that nothing is 
known about the distribution function /(x) except enough to enable us to assume 
that it is continuous. Let a be the average value which P is to have, and suppose 
a sample of size n is drawn from the universe U so that [(1 — a)(n + l)]/2 ~ r, 
say, is a positive integer. Let Xi,X 2 , • • • , be the sample values of x arranged 
in order of increasing magnitude. Let Li = Xr and L 2 = Xn^r+i . The distribu- 
tion law, say g{P) of P the proportion of the universe included between these 
values of Li and is given by 


(1) g(P)dP^ 


r(n + 1) 


r[a(n+l)]r[(l-a)(n+l)] 


pa 




(1 -P) 


(l-oXn+D- 


Up. 


This follows at once from the joint distribution law of Xn and Xn^r+i which can be 
derived as follows: Consider the x axis as being divided into k mutually exclusive 
intervals /i , I 2 j • • • j h with pi , P 2 , • • • , P/t as the associated probabilities 

( X) p,- = 1 ) . In a sample of size n the probability that ni , ^ 

) i / 

(2 values of x will fall into /i , / 2 , • • • , h respectively is given by 

the well-known multinomial distribution law 


( 2 ) 


n\ 


ni!n 2 ! • • • n*! 


pr^p?^ 


P?*. 


To get the distribution of Xr and Xn.~r+i we take k = 5 and for 7i, / 2 , • • • , /s 

we take the intervals (— 00 , X,), (Xry Xr + dXr), {Xr + dXr, Xn-r+l), (Xn-r^l, 

Xn-r+i + dXn-r+i), (^n-r+i + dXn~T+\, rcspectivcly . The values of pi, p 2 , • • • , 

P6 are the integrals oif{x) dx over these five intervals respectively and the values 
of ni , 712 , • • • , ns are r — l,l,n““2r, l,r — 1 respectively. By substituting 
these values of the p^s and n\s in (2) and neglecting terms of order higher than 
dxrdxn .r^i the probability element for Xt and Xn-r+i is found at once to be^ 


(3) 


[(r 




a xn-r+i Vn-2r 

fix) dx\ fiXr) fix„-r+i) dx, dx„_r+i . 


Now let f fix) dx — u, j fix) dx = v, then since du = /(xr) dxr and dv — 
—fiXn-r+i) dXn-T+i , the probability element of u and v may be written as 


(4) 


Tin + 1) 

P(r)r(n - 2r + 1) 




• P'or a discusBion and a rather complete bibliography of the probability theory of “ex- 
treme values” such as Xr and Xn-r+i see E. J. Gumbel, “Los valeurs extremes des distribu- 
tions statistiqucs,” Annales de I’lnstitut H. Poincari (1935). 
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the region of u and v of non-zero probability being the triangle bounded by the 
u and V axes and the line u + v = 1. Making the change of variables 
1 — w — t; = P and u = Q, integrating with respect to Q, and setting r = 
(1/2)(1 ~ a)(n + 1) we find the distribution of P, the proportion of the uni- 
verse included between Xr and x„^r^i to be (1). It should be remarked that even 
if Li and are obtained by asymmetrical truncation by taking Li = x, , Lsi — Xt 

' /(x) dx remains unchanged. 

Thus for a given p, by taking Li = x^ and Li = Xt where ^ — s = n — 2r-l-l = 

a{n + 1), and choosing the smallest value of n for which / g{P) dP > p 

Jb 

and such that (1 — a)(n + 1) is a positive integer we have provided the answer 
to the italicized question for one method of calculating Li and Li ; a method 
which is valid for any unknown continuous distribution f(x). 

As an example, suppose we take a = .99, h = .985, c = .995 and p = .99. 
The size of sample required is found to be 1000 (999 to be exact). In fact in 
this case the probability of P being between .985 and .995 is .992. In this 
example, we may therefore make the statement that if a: is a continuous variable 
under statistical control, and if samples of size 1000 are taken, the tolerance 
limits L\ and L^ taken as the fifth smalk^st and fifth largest values of x in the 
sample respectively, will, on the average, include 99% of the universe between 
them and furthermore, the tolerance limits calculated in this way for samples 
of size 1000 will, in about 99.2% of the samples, include between 98.5% and 
99.5% of the universe between them. 

If Li and Li are taken as the smallest and largest values of x in the sample 
respectively (corresponding to r = 1, i.e. sample range with no truncation), 
then in samples of size KXK), these tolerance limits will, on the average include 
99.8% of the universe betw^een them and the probability is .996 that Li and Li 
will include at least 99.6% of the universe between them. If the largest and 
smallest values of x in samples are used as tolerance limits and if we wish to 
state that the probability is .99 that such tolerance limits will include at least 
99% of the universe, the size of sample required is 660. If the probability is 
lowered to .95 of including at least 99% of the universe, with such tolerance 
limits, the size of sample required is 130. Engineering statisticians* have 
pointed out on basis of practical experience the need of using samples of 100 to 
1000 on even more cases in order to set tolerance limits which will include at 
least 99%) of the universe with a satisfactorily high degree of certainty. The 
examples we have given based on sizes 1000, 660 and 130 will indicate the degree 
of stability to be exjx^cted for tolerance ranges for samples in this range of sizes. 
The degree of stability of the tolerance limits for samples of the size range 500 
to 1000 appears to be of about the order of that demanded by the engineering 
statistician. 

*Cf. W. A. Shewhart, Statistical Methods from the Point of View of Quality Control, The 
Gaduate School of the J.S. Department of Agriculttre, Washington (1939). P. 63. 
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In some cases it may be desirable to determine the size of samples so as to 
control the tolerance limits Li and Lt individually, that is so that the probability 
is at least p that the proportions of the universe contained in the tails of the 
distribution cut off by Li and are in both cases between two given numbers, 
say d and e. In this case we would determine the least value of n so that 

(6) / / h{u. v) dudv > p 

Jd Jd 


where h(u, v) dudv denotes the function given by (4). For example, suppose 
p == .99, d = 0, e = .005. r = 1. The size of the sample needed is 1060. 
Thus in samples of size 1060, the probability is .99 that Li and L 2 taken as 
the smallest and the largest values in the sample respectively will cut off tails 
of the universe such that each tail will include not more than 0.5% of the universe. 

If it is desired to set only one tolerance limit, say Li , then the distribution 
of u would be used. This can be found by integrating (4) with respect to v. 
The distribution is 


( 6 ) 


r(n + 1) 
r(r)r(n - r 4- 1) 




- uy-^du. 


The probability p that the proportion of the universe in the tail which will be 
cut off by Li is between d and e is given by integrating the expression (6) from 
d to e. The value of n required to obtain any given value of p can then be 
determined. For example, in the case where p = .99, d = 0, c = .005, r — 1 , 
the size of the sample needed is 920. 


3. Tolerance range for a normal universe. The method of setting tolerance 
limits discussed in Section 2 assumes nothing about the distribution /(x) except 
that it is continuous. If /(x) can be assumed to have a given functional form 
involving unknown parameters, methods based on the theory of statistical es- 
timation and having greater efficiency than those already discussed could be 
used for setting tolerance limits. We shall not go into a general discussion of 
such methods here although it does appear desirable to consider one very im- 
portant example of the application of the methods. Suppose /(x) can be assumed 
to be a normal distribution function with unknown mean m and variance (r^ 

n 

In a sample of size n let x be the sample mean and let (x, — xf/{n — 1). 

1 

Let us consider as tolerance limits L\ and L 2 the quantities x db ks. The pro- 
portion P' of the universe included between these limits is 

(7) P' = 

V Ztt or 


We wish to determine k so that P(P') = a. It can be verified by straight- 
forward analysis that P(P'), defined by / / P'/(f, s) ds dx, has the value 

JL-oq Jq 


( 8 ) 


r(n/2) 


\/ir(n - l)r((n - l)/2) J-t (1 + a^*/(n - 1))»«’ 


£ 


dx 


'4/n+ 1 
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where f(i, «) is the well-known distribution of £ and « pven by 

fn\ VnCw — 

2-'*-V"v^r((n-l)/2) 

Therefore the tolerance limits L[ and Li which will include, on the average, 
a proportion a of the universe between them are 

(10) f =fc: tay/(n + l)/n>s 

where /« is the value of t for which the integral in (8) has the value a. The 
value of ta can be found from Fisher^s stable for w — 1 degrees of freedom, and 
for certain values of a including .99, .95, etc. and for values of n up to 30. Al- 
though the tolerance limits (10) will include, on the average, the proportion a 
of the universe between them, we must now investigate the size of sample 
needed to obtain a given degree of stability of P'. The exact distribution of P' 
seems to be too complicated to be of any practical value. It is not difficult to 
verify that to within terms of order 1/n, the variance of P' is given by 

(11) <r%f = 


The variance of P, the proportion of the universe included between Xr and 
a*n-r+i , to within terms of order 1/n is given by 

(12) <rp = a(l — o)/n. 


For a large sample of a given size, say n = 100 or more, a simple comparison 
of the stabilities of the two tolerance ranges (Xr , Xn-r+\) and {x ifc t^y/ (n + \)/n* 8) 
can be made by comparing and . For a * .99, the efficiency ratio <r* //o'#* 
is .28 indicating that for large n and when the universe is normal, samples of 
size .28n have the same degree of stability in setting tolerance ranges (10) as a 
sample of size n has when (Xr , x„_r+i) is taken as the tolerance range. The same 
thing may be viewed in another way: The fact that the range of values of P' is 
0 to 1 suggests that we may be able to get a fairly close approximation to the 
true distribution of P' by fitting a Pearson Type I function of the form 


(13) 


P(® /5) 


determining a and by equating the mean and variance of the distribution (13) 
to the mean and variance of P' respectively. Accordingly we find 


(14) 


a *=* [a^(l — a) ckr* /]/aV 

P = [a(l — a)* — (1 — a)<7i*/]/<T* / . 


Thus it will be seen from (14) that in order for the fitted distribution (13) to be 


identical with the distribution (1) a sample of only 




(n + 2) cases is 


ira(l — o) 

needed. 

In case only one tolerance limit is to be set, e.g. x — tay/(n + l)/n«8, the 
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proportion, say w', of the universe which will be included in the tail has mean 

value (1 — a)/2 and variance ^ (approximately) for large n. The 

4im 

ratio of this variance to that of w, which is approximately (1 ~ (i)/^n for 
large n, gives the efficiency of using Xr for the lower tolerance limit in case of a 
normal universe. For example, if « = .99, the efficiency is .18. 

It is perhaps appropriate here to point out the distinction between confidence 
limits and tolerance limits. It is well-known that in a sample from a nomial 
universe with mean m the probability is a that the confidence limits 2 ± Us 
will include the population mean m between them. The tolerance limits 
X dz ta\^{n + l)/n-s, on the other hand are used to estimate the middle 100a% 
of the universe. Although the tolerance limits x db /aV (n + l)/n-s are much 
more stable for a given sample size than those given by Xr and Xn-r+i , in case 
of a normal distribution, it should be emphasized that in case of even slight 
non-normality, particularly when skewness is present, the former pair of limits 
are apt to give very erroneous results with reference to the proportion of the 
universe included in the tails. Confidence limits estimating m are probably 
much less sensitive to skewness than tolerance limits estimating the middle 
100a% of the universe, particularly when a is nearly unity. 

Another important aspect of the problem of setting tolerance limits is the 
following: Suppose small samples of a given size are taken from a universe 
under statistical control. How many of these small samples should be taken 
as a basis for determining tolerance limits Li and L* of some function, say gf, 
of the samples (e.g. the sum of the measurements in each sample) so that the 
proportion of samples in the universe of such samples having values of g Ix^tween 
Li and Lt will have a given mean with a given degree of stability? One obvious 
approach to this question is to consider a universe of samples in the same manner 
in which we have considered a universe of individuals throughout the present 
paper. This approach, however, does not make very efficient use of the observa- 
tions, but we shall not enter into a treatment of the problem here. This problem 
and various related problems in the statistical methods of mass production 
remain to be studied. 


4. Summary. A method based on truncated sample ranges for determining 
size of sample required for setting tolerance limits on a random variable x having 
any unknown continuous distribution f(x) and having a given degree of stability 
is given. A method for setting tolerance limits corresponding to a given degree 
of stability in case f{x) is normal is discussed and a comparison of the stabilities 
of the tolerance limits set by the two methods in the normal case is made. 
Illustrative examples of the methods arc given. 



ON A CERTAIN CLASS OF ORTHOGONAL POLYNOMIALS 


By Fbank S. Bbalb 
Lehigh University, Bethlehem, Pennsylvania 

Introduction. E. H. Hildebrandt has demonstrated the following theorem*; 
If y is a novridentically zero solvUon of the Pearsonian Differential Equation, 


( 1 ) 

( 2 ) 


1 ^ Op + OlX 
y dx i>o + 61* + 61** 


Z)"~* £ 
y dx 


- 0 ‘y) m PJft, x). 


N 

O’ 


<U, bi real, then 


n, k integers, n > 0, is a 


polynomial in x of degree n at most. Hildebrandt has obtained variolas relations 
coimecting the Pn{k, x) and their derivatives as well as a recurrence relation. 

If in (2) we aet k — n there results from a proper choice of N and O in (1), 
the classical Hermite, Laguerre, Jacobi and Legendre Polynomials. Many 
properties of these classical polynomials have been obtained by numerous 
investigators.* 

One of the most important of these properties is that of orthogonality which 
can be stated as follows: Consider a sequence of the classical polynomials $<(x) = 
X* — /S<x'~‘ + • • • . There exists an interval (a, b) finite or infinite and a unique 
weight function ^{x), monotonic nonrdecreasing over (o, b) such that. 


( 3 ) 


f 




0 , 


forn 9^ nu 


In the future we will refer to the type of orthogonality given by ( 3 ) with numch 
tonic non-decreaeing as orthogonality in the restricted sense. In order to determine 
whether a given system of polynomials is orthogonal in the restricted sense we 
have the following theorem:* 

Theorem 1. In order that the sequence of polynomials = a;* — + 


^ E. H. Hildebrandt, ‘‘Systems of polynomials connected with the Charlier expansions, 
etc.,*' Annals of Math. Stat,^ Vol. 2(1931), pp. 379-439. 

* For an account of these properties as well as an extensive bibliography the reader can 
refer to one of two treatises viz. : J. Shohat, ThSorie GinSrale des Polynomes Orthogonaux de 
Tchebicheff Memoriale des Sciences Math^matiques, Fascicule 66, Paris, Gauthier Villars, 
1936. 

Gabor Szego, Orthogonal Polynomials f Am. Math. Soc., Colloquium Publications, Vol. 
23, 1939. 

’ J. Shohat, *The relation of the classical orthogonal polynomials to the polynomials of 
Appell," Am, Jour, of Math.f Vol. 58(1936), pp. 454-465. 
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• • • , i « 1, 2, 3, • • • with real coefficients be orthogonal in the restricted sense it is 
necessary and sufficient that there exist a recurrence relation, 

(4) ^i{x) = (a: - Ci)^M{x) - K^i^i(x), = 1, ^ x - Ci , 

Ci , Xi const with all > 0, t > 2. 

With Shohat* we mil say that a system of polynomials ^i{x) = x’ — + 

• • • y i = 1, 2, 3, • • • , with real coefficients is orthogonal in the general sense if 
there exists at least one weight function ^{x), of bounded variation over (a, b) such 
that (3) is satisfied. In connection with generalized orthogonality we have the 
following theorem:* 

Theorem 2. In order that the system i = 1, 2, 3, • • • orthogonal in 
the general sense it is necessary and sufficient that relation (4) be satisfied with all 

\i 9^ 0 . 

It is the purpose of this paper to investigate the orthogonality properties of 
the general polynomials Pn(n, x) given by (2). In Part 1 a general recurrence 
relation is derived which applies to all the polynomials Pn(fc, x). In Part 2 all 
the different types of orthogonal polynomials Pn(n, x) are determined by making 
use of the general recurrence relation derived in Part 1. We also show, follow- 
ing lines laid down by Hahn®, that the only systems of polynomials with simple 
zeros which are orthogonal in either the restricted or the general sense and whose 
derivatives are orthogonal in either sense are the systems considered in Part 2. 


1. The general recurrence relation. From (2) we can write. 


(5) X) = 5_.-- I- D^y ^ 


y dx’ 


y dx’' 


Apply Leibnitz Formula to the right side and make use of (2). There results, 


x) = Pn-i(k — 1, x) + {n — l)D'P„-iik — 1, a;) 

(n - l)(n - 2) 


+ 


12 


D"DP.^{k - 1, x). 


From Hildebrandt’s paper we have,' 

(7) PMk + 1, x) = [iV + (A + l)Z)']Pn(A:, x) + n[N' + ik + l)D"]DPn-iik, x). 

Decrease k and n each by one in (7) and obtain a relationship which we number 

(8) . Again decrease n by one in (8) and get a relation which we number (9). 


* J. Shohat, ‘*Sur les polynomes orthogonaux g6n6ralis68,” Comptea Rendua, Vol. 207 
(1938), p. 556. 

* Wolfgang Hahn, “Uber die Jacobischeu polynome und zwei verwandte polynomklas- 
Benf^ Math. Zeita., Vol. 39(1934-35), pp. 634-638. 

® E. H. Hildebrandt, loc. cit. p. 407. 
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From (6), (7), (8) and (9) eliminate P^(k, *), Pn~»(k — 1, x), and P*-i(A: — 
1, x). There results, 

(10) [2Ar' + (2* - n + 1)Z)»](JV' + lbD"]P»+a(A: + 1, *) 

= {[2Ar' + (2& - n + l)£>"][JSr' + + (* + 1)Z)'] 

+ n[iV' + (A: + 1)D*']12N'D' + kD’D" - ND"]\Pn(k, x) 

+ n[N' + (k + l)D”][2{N' + kD")*D 

- (AT + kD'){2N'D' + JfcD'D" - ND"))P^i{k - 1, x). 

In (10) decrease n and k each by one and replace N and D by their values from 
(1). Thus we get, 

(11) [oi + (2fc - n)bt][oi + 2(& - 1)6,]P.(A:, x) 

= {[ai + (2A: - 2)6s][a, + 2A:l>,](oi + (2i - 1)6,]* 

"I" [oi + (2A: — 2)6,][ai + (2A: — n)6,][ao 4* A:5J 
+ (n ~ l)[oi + 2kbi][aib\ + (A — l)bi6i — ao6,]jPit-i(A — 1, *) 

+ (n - l)[ai + 2A6,]|6o[o, + (2A - 2)6,]’ 

~ [uo + (A — l)6i][oi6i + (A — l)6i6, — ao6,]}P„-,(A — 2, *). 

In this recurrence formula the Pn(A, *) have in general a coefficient of *” dif- 
ferent from one. Polynomials which have one for the coefficient of *" we will refer 
to in the future as normalized. Let us now transform (11) for normalized Pi,(A, *). 
Theorem 1 deals with polynomials normalized in the above sense. Let us write, 


P„(A, *) = - 6„*”“* 4- . . • . In (4) set, 4>,(*) = P,(A, *)/an.* . 

Thus we get, 

(12) P„(A, *) = (An* - P,)P»_l(A - 1, *) -^ TnPn-,(A - 2, *) 

where 


■ 


®n, k ^ 
An j 


An « , and B* 

On-l.Jt-l 




Relation (12) is essentially of the same form as (11). Each of these is to be 
reduced to form (4). 

From a previous paper by the author^ we have, 

(13) PUi(A;, x) = (n + 1)IN' + J(2ik - n)2)"]Pn(fc, x). 


n — 1 successive applications of this relation give us, [Po(A;, x) s 1], that the 
coefficient of x** in Pf,(A:, x) is, 


^ Frank S. Beale, **On the polynomials related to Pearson’s differential equation,” 
Annala of Math, Stat,^ VoL 8(1987), p. 207 (2). 
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(14) ttn.jb =* n [fli + (2fc — n + 1 + t)6J. 

By employing (14) in (12) we see that (12) or (11) reduces to form (4) where, 

[oi + (2fc — 71)62] [uo + kbi] 

[oi + 2A^][ai + (2A; — 1)62] 

+ (fc ““ 1)6162 ““ 0062] 

[ai + (2k — l)62][ai + (2k — 2)62] 

[ai + (2fc — n — l)62]{6o[ai + (2k — 2)62]* 

’ ^ [ai + (2k - 3)62][ai + (2k - 2)hflai + (2k - 1)62] 

Equation (16) together with Theorems 1 and 2 can now be applied to the poly- 
nomials Pn(kf X). 

From (14) it is seen that Pn(kf x) is of degree n provided that none of the factors 
of the product vanishes. This condition we assume to hold here for all n. 

We can now^obtain a recurrence relation for the gth derivatives of Pn(fc, x). 
A repeated application of (13) leads to, 

(17) f- P.ik, x) = P^,(k, x) n (n - t) [oi + (2fc - n + t + Dh], 

ax^ t-o 

where Pn(k, x) is not normalized in the above sense. By considering the right 
side of (17) together with (14) we see that (17) can be divided by 

Un— 2iib n (n — i) [a\ + (2k — n -f- 1 + 1)62] 

«-o 

and thus normalize the polynomials on both the right and left sides of (17). 
Consequently the recurrence relation for normalized df[Pn(k^ x)]/dx^, n = 
0 , 1, 2, • • • , Is identical with the recurrence relation for normalized Pn^q(ky x) 
as given by (4), (16) and (16) when we replace w by n — g in these latter. 


(16) 

(16) 


2. The different t3rpes of orthogonal Pn(7i, x). Suppose first that 62 0 in 

(1). A transformation on x with real coefficients can be affected which changes 
(1) into either, 

1 ^ _ (a - 0) + (-a - p)x 

y dx 


(18) 


(19) 


1 - x^ 
1 dy _ —2mx — q 


or 


ydx 


a^ + x^ 


(A) Equation (18) together with (2) for ifc = n defines the generalized Jacobi 
Polynomials (normalized in the above sense), 

J„(x, «, /3) = (1 + x)-“(l - xT^f- [(1 + x)”+“(l - x)"+^] 

-Un.n ax 
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where l/an,i» is given by (14). If in (16) we set i — n and make proper replace- 
ments for constants as (18) and (1) show we have, 

' _ A(n — 11 (a + 0 + n — l)(a n — 1)09 + » — 1) 

(20) "" -'(a-|-/8 + 2n-3)(« + ^ + 2n-2)*(a + /» + 2n- 1)’ 

n ^ 2. 


From Theorem 1 and this value of X« we conclude that ifa>— 1, j9>— 1, 
the sequence ( Jn(x, a, 0)} is orthogonal in the restricted sense— a well-known 
result. From Theorem 2 we can similarly conclude that if neither a, 0, nor 
(a -f 0) equals —j,j a positive integer, the sequence {Jn(x, a, /3)} is orthogonal 
in the general sense. 

(Ai) If in (18) we set a = /3 » 0 we obtain a differential equation which 
together with (2) for A; = n leads to the Legendre Pol 3 momials, (normalized in 

tl f //** 

above sense), Pn(x) = ^ (a;* — 1)". Setting a = /3 = 0 m (20) leads to 

Xn == , n > 2. Thus from Theorem 1 we conclude that the 

(2n — 3)(2n — 1) 

Legendre Polynomials are orthogonal in the restricted sense, a result well known. 

(B) Equation (19) together with (2) for A: = n leads to a class of polynomials 
(normalized in above sense), mentioned by Romanovsky.® 


Rn(x, m, g, a) = — (a® + 


exp (- tan ^ ^ ["(a* + a:*)**”^ exp — - tan ^ - 
\a a/dx^t ^ a a. 


where again l/un.n is given by (14). In (16) set k ^ n and make the proper 
replacements of constants and, 

\ = !Llli (2m - n + l){4a^(m - n + D* + q^] n > 2 

4 (2m — 2n + 3)(m — n + l)*(2m — 2n + 1) ' ’ 


From Theorem 2 it now follows that the sequence {Rn{x, m, g, a)) is orthogonal 
in the general sense if m 5 ^ j/2y j a positive integer. There is no set of parameters 
m, g, a which assures orthogonality in the restricted sense. 

In connection with Romanovsky^s note there appear to be several discrepan- 
cies. For the weight functions given there under types IV and V, the nth 
moments for sufficiently large n do not exist over the intervals there considered. 
Type V is the special case of type IV for a = 0. Type VI is none other than 
Jacobi Polynomials so that the orthogonality relations given there for this case 
are incorrect. In all three types listed certain of the recurrence relations for 
the polynomials are in error. 

(Bi) We note here one special sub-class of Bn . Take m = g = 0 and a « 1 
in (19). We obtain from (2) and (14) a system of normalized polynomials 


analogous to the Legendre Poljoiomials namely, <t>n{x) 


n! cT 
(2n) ! dx^ 


(X* -h 1)“. 


* V. Romanovsky, **Sur quelqiies classes nouvelles de polynomes orthogonaux/* Comptea 

Rendua, Vol. 188(1929), pp. 1023-1026. 
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It 18 easy to verify for these that, 

^ (fx * 0, fit H, i 


(C) Suppose that in (1), 6j = 0, 6i 0. A linear transformation with real 

1 dy a — X 


coefficients changes (1) into, 
and (14) for k 
in above sense), Lnix, a) 


j - . This equation together with (2) 

y ax X 

n defines the generalized Laguerre Polynomials, (normalized 


(-l)"x-V^Jx"+“e^. 


Setting k and making 


proper replacements in (16) we get, \n — {n l)(a + n — 1), n > 2. From 
Theorem 1 we see that if a > — 1 the Ln are orthogonal in the restricted sense, 
a well-known result. From Theorem 2 we can say that if a — j, j a positive 
integer, the polynomials are orthogonal in the general sense. 

(D) If in (1), 6i 8= 62 * 0, 6o 5^ 0 we can perform a linear transformation on 

X with real coeflScients and get, - ^ = /uc. This dififerential equation together 

y ax 

1 t f 

with (2) and (14) gives a set of normalized polynomials (r»(x) = e“**‘^* c***'*. 

Taking k =‘ n and making proper substitutions for constants in (16) we get 
= — (n — l)/h, n > 2. If A is negative it follows from Theorem I that the 
sequence { Gn{x ) } is orthogonal in the restricted sense. In fact, (t„(x) = Hn(x) s 
Hermite Polynomials. 

On the other hand, if A is positive we have from Theorem 2 orthogonality in 
the general sense. In fact, it can be easily verified for this case that, 



c**‘«(?,(x)G«(x)dx 


= 0 , 


m 7^ n, 


t =\/ — 1. 


(E) The only remaining possibility for (1) not so far discussed occurs when 
N ai constant and D is linear. In this case it has been shown that Pn{k, x) 
of (2) reduces to a constant.* 

E. H. Hildebrandt has shown‘“ that the polynomials Pn{n, x) of (2) satisfy 
a differential equation of the form, 

(bo + bix + 6»x*) + [oo + hi + (oi + 2bt)x]^ 

(21) <lx^ dx 

— n[oi + (n + l)Wy = 0, n = 1, 2, 3, • • • . 

Moreover with the coefficients of (fyldx’‘ and dy/dx in (21) he has shown that 
for (21) to have a polynomial solution of degree n the coefficient of y must be 
of the form given in (21). 


' Frank S. Beale, loc. cit. p. 209, Theorem I» . 
Loc. cit. pp. 404-405. 
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From ( 16 ) we can say that for 1; » n and an orthogonal sequence Pnin, x), 
n *= 0, 1, 2, • • • we have, 

(22) oi + (n - 1)61 0 , 

( 23 ) 6o[ai + ( 2 n - 2 ) 5 »]* - [oo + (» - l)6i][ail>i + (» - l)bi6i ^ Oohi] 0 , 
where n is an integer > 2. Considering for (21) a solution of the type y “ 

00 

2 ^ CiX* we readily show that if ( 22 ) and ( 23 ) are satisfied, (21) possesses for 

each n a single polynomial solution of degree n. Two solutions which differ 
merely by a constant factor are regarded as the same solution. This polynomial 
solution of (21) must be P«(n, x). 

By emplojdng theorems from a previous paper by the author” we can show 
that if (22) and ( 23 ) are satisfied, the zeros of the polynomials of section IT are 
simple whether these zeros are real or complex. 

Hahn has shown” that if a set of normalized pol3momials and their deriva- 
tives satisfy a relation of the form ( 4 ) with 0 and if the zeros of the poly- 
nomials are all simple then the polynomials must necessarily satisfy an equation 
of form (21). Since in this paper we have considered all possible values of 
at , {i = 0, 1), and hi , {i 0, 1, 2), which lead to orthogonal pol}momials, it 
follows that the only systems of polynomials with simple zeros and orthogonal 
in either restricted or general sense whose derivatives in turn are orthogonal in 
either sense are the systems of section 2. 

“ Loc. cit. pp. 207-209, Theorems It to Im . 

“ Loc. cit. pp. 634-636. 



THE SKEWNESS OF THE RESIDUALS IN LINEAR REGRESSION 

THEORY 

By P. S. Dwyer 

University of Michigan^ Ann Arbor^ Mich, 

In obtaining the regression of 2/ on a: it is customary to show the relation 
between the actual and the estimated y by computing the stan dard deviation 
of the residuals with the use of the formula (r« *= <ry y/l — r*. If the errors 
are distributed normally one may estimate the number of values coming within 
one standard deviation, within two standard deviations, etc., of the regression 
line. However these errors are not always distributed normally, and in such 
a case it seems wiser to compute the skewness of the residuals and to use a 
Pearson Type III curve in making the interpretation. The present paper out- 
lines a technique for the calculation of as:# which is feasible from a practical 
standpoint. It is based (a) on a cumulative totals method of obtaining the 
correlation coefficient which, at the same time, makes possible the determination 
of the third order moments needed to evaluate the skewness and (6) on an effi- 
cient ritual for computing the coefficient of skewness from the moments. 

The determination of the normality or non-normality of the residuals is not 
always immediately evident. If the scatter diagram or correlation chart is 
presented, one can make an estimate of the extent of normality but if not, and 
the most modem and efficient computational methods do not utilize the correla- 
tion chart, there is no way by which the presence or absence of normality can 
be detected. Some research workers are opposed to the use of the more efficient 
methods (particularly the use of the Hollerith tabulators) because the correla- 
tion chart is not presented. Though within limits it is possible to use the 
tabulator to present the correlation chart simultaneously with the values needed 
to compute the correlation coefficient [1], it is here suggested that the computa- 
tion of the skewness of residuals, which can now be accomplished quite easily 
from the tabulator runs, may be substituted for the examination of the correla- 
tion chart. 

The classical least squares theory makes use of 

(1) € == 2 / — bo — hix 

where bo and 61 are the solutions of the normal equations. We note that the 
first normal equation is Ze — 0 so that JIf , = 0 and the residual is a deviation. 
It follows that the skewness of residuals is 

X(y — bo — bixY 
Nol 
104 


( 2 ) 
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We wish to compute ot;, without computing the indi'ddual residuals. The 
denominator causes us little concern but it seems discouraging to evaluate such 
an expression as 

- WbJ - 6!Sa:‘ - - ZbiXxy* + SbtSy 

- Zl^biZx + ZblXx^y - ZblboZx* + ebobiZxy 

even though the values of bo , bi , N, Xx, Xy, Xx', Xxy, Xy*, S*’, Xx*y, Xvy*, 
Xy* are available. 

A first simplification is made by summing (1) and dividing by N. We then 
have 

(3) M. = My - bo - biM, 

and by subtracting (3) from (1) and denoting deviations by barred letters, 
we have 


(4) e = y -bt£ 

so that the skewness of errors is 

_ Xy* - ZbiXxf + Zb'iXi^y - blXi* 


(5) 


aa:< 


Na] 


This formula can also be expressed as 

Moi ~ Zbifiu + 36i/iji — bifito 

® u - 

A similar formula for the skewness of the residuals of x on y is 

M80 36i/i2i + Sbifiii — 6i*mo3 

ij. - • 

For theoretical purposes formula (6) may be put in standard units with 
bi — r~ ,b[ = r — , iko — ovfll , iki — a»nrj(r» , etc. with the resulting 

ffx 


( 8 ) 


at:t 


ofo8 — 3rai2 -h 3r^a2i — r^otso 

(1 ^7^1^ • 


As r — > 0, 03 :. 08 :^ just as or« — ► cTy as f — ► 0. 

Formulas (6) and (7) are of some theoretical importance in that they show how 
the skewness of the residuals is connected with the skewness of the marginal 
distribution. Thus 

as Mil — ► 0, 6i and hi 0 and oa:# — ^ aa:y , a#..*# — ► os:* ; 
as hi — ► 00 , ai;« — at:« and as hj oo, oa;#' — ♦ —otiy ; 
as hi — ► 1, oa:. «8:y~» • Similarly as h( — > 1, aa.#' a8;»-y • 
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It is hence possible in some oases to get a good approxunatkm to the skewness 
of the residuals if the regression coefficients and the skewness of the marginal 
distribution are known. 

TABLE I 


Correlation from first order cumulations 


(1) 

(2) 

(3) 1 (4) 1 (6) 

(6) 

(7) 1 (8) 1 (0) 1 (10) 

(11) 1 (12) 1 (13) 

(14) 

H 



4.00 

3.99 

3.60- 

3.49 

3.00- 

2.99 

2.60- 

2.49 

2.00- 

1.99 

1.60- 

1.49 

1.00- 

.99 

.60- 

.49 

.00- 




X 

y \ 


8 

7 

6 

6 

4 

3 

2 

1 

0 






13 

■ 

107 

220 

341 

179 

121 

60 

■ 



4.00 

6 

18 

6 

2 

5 

6 

1 







3.99 

3.60- 

6 

106 

2 

. . 

19 

29 

27 


7 


1 

1 

673 


3.49 

3.00- 

4 

178 

3 

12 

36 

63 

44 

18 

6 

5 

2 

1503 

1360 

2.99 

2.60- 

3 

270 

3 


20 

55 


33 

27 

11 

8 

2668 

■ 

2.49 

2.00- 

2 

330 

■ 

6 

11 

54 

114 

67 

46 

19 

13 

3714 

B 

1.99 

1.60- 

1 

173 

1 

1 

6 

■ 

45 

44 

34 

18 

7 

4244 

2993 

1.49 

1.00- 

0 

51 



2 

7 

14 


8 

6 

4 

4399 

2993 



Cyz 

1 

61 

269 

661 


2194 

2578 

||R|i 

2923 

2993 

12815 

Bi 



Cx, 

■ 

464 

1096 

2196 



4339 

4399 

1 4399 


J 


For actual computation, we use (6) and (7). It has been indicated previously 
how the values Sx, Sy, Sxj/, Sx* and Xy^ could be obtained with the 
use of cumulations. An illustration used previously [2] is presented in Table I. 
The information was obtained from the Office of Educational Investigations of 
the University of Michigan and gives the University first semester average (X) 
and the high school average (F) for 1,126 students entering the College of Litera- 
ture, Science, and the Arts in 1928. 

The new origin of each variable is taken at the class mark of the lowest class 
rather than at the class mark of a middle class as is conventional. In this way 
all negative terms are avoided in the computation of the moments. The x's are 
arranged in descending order from left to right and the y^s in descending order 
from top to bottom. The notation Xy is used to indicate the sum of all the x^s 
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having the same value of y. Thus the first entry in column 18 is 5*8 4- 2*7 + 
5.6 + 5«6 + 1<4 ■■ 113. The oduranC«« is obtained by cumulating the valuee 
of Xy . Similarly yy is the sum of all the y’s having the same value tmd the first 
entry in column 14 is 18(6) » 108. The entries Cyy , Cy, , and Cx, are obtained 
similarly. 

The entries lx, ly, lx*, Ixy, ly* are found in the lower right hand box in this 
position ; 


lx 

ly Ixy 
lx lx* 


V 


The values of lx and ly are obtained from the final cumulations while the value 
of Ixy is obtained by adding the entries in the column above, or, as a check, 
the entries in the row to the left. The value of ly* is obtained by adding the 
entries in the row at the left of the box while the value lx* is obtained by adding 
the entries above the box. 

The values of the third order sums are obtained by multiplying the entries 
above the box and to the left of the box successively by 1, 3, 5, 7, 9, etc. Thus, 

lx* = 4399 + 3(4339) + 5(4097) + etc. - 102,103, 

lx*y = 2923 + 3(2809) + 6(2578) + etc. = 63,121, 

(9) 

Ixy* = 4244 + 3(3714) + 5(2568) + etc. = 46,047, 

ly* = 2993 + 3(2820) + 6(2160) + etc. = 38,633. 

In making the reductions we use ab — cd operations as much as possible. 
We first compute 

Ay,y = Nlxy — (lx){ly), 

(10) A,., = Nix* - (lx)*, 

Ayt,y « iV2x*y — (lx*)(ly). 

We note too that 

MM = - (22x)(4..,)]/Ar'; M ,1 = lNA,*.y - (2lx)(Ay.y)]/N* 

( 11 ) 

Ml* = lNAy.y. - (21y)(Ay.y)]/N*; MM = m*.., - (21y)(Ay.y)]/N* 

and finally we get a*;, or «»:,< by (6) or (7). 

The general solution is outlined on the left of Table II. We record in Fig. A 
the values given by (9) and in the Fig. B the values resulting from the applica- 
tion of (10). The values 21y and 21x are inserted in Fig. B to facilitate the 
calculation of Fig. C which gives the values of (11). The technique is very 
easily carried out once it is understood. It can be performed with hand calcu- 
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latore but it is ideally adapted to the use of the latest Marchant, Frid4n, and 
Monroe models equipped with automatic positive and negative multiplication, 
so that ab-cd operations can be performed with a minimum of effort and a max- 
imum of accuracy. Actually the value of “a,” which is the total frequency, is 
the same for many of these operations so that there is further saving if a ma- 
chine is used which permits the locking in of a constant in such a way that it 
can be used, without continued key punching, in later ab-cd operations. 

TABLE II 


Abbreviated techtUquee for computing third order central moments, etc. 

Fig. A. 


N 

Zx 

lx* 

Srr* 

1126 

4399 

20245 

102103 

S!/ 

^xy 

2**1/ 


2993 

12815 

63121 


2yt 




10069 

46047 

i 


Zy* 




38633 





Fig. B. 


N 

2Xx 

As,S 

As^.S 

1126 

8798 

3444669 

25910223 

2Zv 

As, If 



5986 

1263483 

10480961 


QM 

Qmiii 



2379645 

7555391 




l^B 



13364241 





Fig. C. 


N 


As,S 

Ar>jB,o 

1126 


3444669 

1 

-1131286764 


Ax,y 

N%, 



1263483 

685438652 


Ay,y 

N*fiu 



2379645 

944161028 



N'Pn 




803580396 





Fig. D. 


N 

(6.) 

^20 

noi 

1126 

(.367) 

2.717 

-.7925 

(6.') 

nil 


(-6f), (-360 

(.631) 

.997 

.4801 

(-1.593) 

Uot 

Hii 

(36f), (36(*) 


1.877 

.6614 

(.846) 


Aos 

(-36.), (-6.'*) 



.5629 

(-.150) 




The values in Fig. D are obtained by dividing the values , A.j, , and 
A,., in Fig. C by JV* and the values in the diagonal below, NAys,y — (22y)A„,, , 

etc., by N*. The values bi = ^ and hi ~ ^ can be inserted in Fig. D adjacent 
to the N. The value of the correlation coefficient is r = y/bibi * 
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We have too, <r, “ \/flM — bifiu and »•' ” y/ikt — hifiu so that the standaid 
deviation of residualB is readily computed from the entries of Fig. D. Tlie 
numerator of (6) is readily obtained after entering — 3bi , 3bi , (— bj) in the 
diagonal under the diagonal containing the third moments and multi^flying by 
columns. The numerator of (7) is obtained by entering —b\* , 35“ , —361 , 
in the same diagonal and multiplying by rows. The theory is applied to the 
results of Table I and the details are presented at the right of Table II. It is 
to be noted that all values indicated here are the coded values x, y and not the 
original values, X, Y. However, the correlation coefficient and the skewness 
of errors are independent of any such change in unit, grouping errors being 
neglected. 

From Fig. D we see that hi = .997/2.717 = .367, that h'l = .997/1.877 * 
.531 and that r = V (.367)(.531) == .441. In this case we wish to estimate 
college record, x, from high school record, y, so we use b'l « .531 and compute 
-3b'i = -1.593, 3b[^ = .846, -b[* = -.150. It foUows that 


-.7925+ (.4801)(-1.593) + (.6614)(.846) + (.5629) (-.150) 
(2.717-.531(.997)]»« 


-.334. 


It thus appears that a better picture of the variation of the residuals in this 
case is obtained with the use of a Pearson Type III with at approximately — J 
than is obtained with the use of a normal curve. It is not necessary, of course, 
to form Fig. D as the results can all be obtained from Fig. C. Thus if we 
multiply the numerator and denominator of (6) by N*, we get entries, with the 

exception of the b’s, which are in Fig. C. Now in this case bi = and bi =* 




so that these values can be inserted in the upper left as before. Also the 


powers of 6i can be inserted in the lower right as in Fig. D. We have then 


-1131,286,764 + (685,438,652) (-1.593) + (944,161,028)(.846) 

+ (803,580,396)(-.160) 

[3444669 - (1263483)(.531)]»« 

We know however, since the grades were coded, that it is not sensible to carry 
results to more than three places, (and, indeed, a three place determination of 
the skewness is very satisfactory for interpretive purposes even though more 
places might be obtained) so we cut down the number of places. The division 
of numerator and denominator by 10*, and the dropping of the decimals results in 


as:.' 


-1131 + 686(- 1.593) + 944(.846) + 804(-.150) 
[344 - 126(.531)]»« 


-.335. 


It is possible of course to duplicate the theory indicated in Table II with the 
use of moments rather than the A’s. In this case Fig. A consists of 1, Ix/N, 
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205*/^/ etc. We have such formulas as ” -- Mintta > 

where Oi* = 

It would be possible to compute the on., in a somewhat similar fashion though 
it would take somewhat longer. In the first place we would have to compute 
2 xY from the correlation table. This could be done by forming the cumula* 
tion C(i/l) and multiplying by 1, 3, 5, 7, 9, etc. When this is done, however, 
it does not appear that the calculation of the central moments of the fourth order 
can be reduced to as simple a ritual as the calculation of the third order moments. 

The question should be raised as to the calculation of the skewness when 
there are two or more independent variables. This can be done, of course, but 
the calculations are lengthy. The point of the present paper is to provide an 
easy and simple technique for computing the skewness of residuals in the case 
of two variable linear regression. 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON THE ADJUSTMENT OF OBSERVATIONS 
Bt Abthur J. Kayanaoh 
The Forman Schools, Litchfield, Conn. 

The method of least squares has been extended to the adjustment of observa- 
tions with errors in more than one variable. The history of the development 
and its principal results have been given by Deming [2], [3]> [4], [6]. The basis 
is the assumption that for the ^^best’’ adjustment the sum of the weighted 
squares of all the residuals (observed values minus adjusted values) must be 
made a minimum with respect to the adjustments to the observations and with 
respect to the parameters involved in the conditions the adjusted values must 
satisfy. In certain problems, such as some arising in the study of relative 
growth in biology, this assumption is not adequate; it is necessary that the 
sum to be minimized be generalized to include cross products as well as squares 
of the residuals. 

Suppose we have a set of n universes of g-dimensional points whose centers of 
gravity are known to satisfy certain conditions; for instance, they might all lie 
on a certain type of curve. A sample having been taken from each universe, 
the center of gravity of each sample is taken as the observed center of gravity 
of the corresponding universe, and it is desired to determine the most probable 
set of adjustments to the coordinates and the most probable set of parameters 
involved in the conditions, subject to the requirement that the adjusted values 
satisfy the conditions exactly. It is assumed that the sampling distribution of 
the center of gravity in each universe satisfies the multivariate normal law, and 
that the standard deviations and coeflScients of correlation of each sample may 
with sufficient accuracy be taken as the constants of the corresponding universe. 
Then by reasoning analogous to that of the derivation of the least squares 
principle for one variable from the univariate normal law, the probability of 
getting the observed set of values is proportional to e*"®, where 

(1) Q = E Qi 

Qi being a homogeneous quadratic function of the errors at the tth centroid and 
in general involving the cross products as well as the squares of the errors. 

in 
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The probability will be a maximum when Q is a minimum. Consequently the 
best estimates for the coordinates of the centroids will be those making Q a 
minimum, subject to the conditions which the coordinates must satisfy. 

For example it may be desired to study the relation between height and weight 
among growing boys by fitting a curve to the points whose abscissa and ordinate 
are respectively average height and average weight of a particular age group, 
one point corresponding to each age group in the study. The data for such a 
study are obtained from samples of the several age groups. Then the number 
n of universes is the number of age groups being studied, each universe con- 
sisting of the totality of two-dimensional points obtained by pairing the height 
with the weight of each boy in the age group. The centroid or ‘‘average point 
of each universe would ideally be obtained from measurements of all the in- 
dividuals of that age, but since sampling must be resorted to it is necessary to 
make allowances for the sampling distributions of the centroids. It is known 
that within each age group there is correlation between height and weight [1]. 
Consequently the sampling distribution of each centroid will exhibit a correla- 
tion which can be expressed in terms of the coefficient of correlation between 
height and weight of the individuals of the universe from which the sampling 
distribution arises. The existence of this correlation results in the presence of 
the cross-product term in the exponent of the bivariate normal formula de- 
scribing the sampling distribution of the average values, that is in the Qi of each 
centroid. If there were no such correlations the cross-product term in each Qi 
would vanish and the situation would reduce to that of least squares. 

In the general case, let Xu , Xu , • • • , Xqi be the observed coordinates of 
the ith centroid, xu , , • • • , Xqi the adjusted values (to be determined), and 

Vj’i = Xji — Xji . Then Qi may be written 

Qi = WlliVli + WluVliV2i + ••• + WlqiVliVqi 

/o\ + y>2UyuVu -f- W2i2iV\i + • • • + W2qiVtiVqi 

+ 

*4“ qiy 1% “f" qi^ 2% “t” * * * *4” '^qqiy qi 


the being the weights, with Wju = Wku . Thus in the case of two variables, 
if Ni be the number of items in the tth sample, r,- its coefficient of correlation, 
and <rii , an its standard deviations, then 


Win = 


Ni 


2(1 - 


wm = 


-N,ri 


2(1 — r\)€r\ia2i 


= W2\i^ 




Ni 


2(1 — r\)(r\i 


The coefficients of the cross products in Q involve the coefficients of correla- 
tion of distributions. If the latter are all zero the cross products vanish and Q 
reduces to the sum of weighted squares, which is the basic expression of the 
least squares procedure. Consequently, from this point of view, the least squares 
assumption is equivalent to the assumption of zero correlation between the 
errors. The procedure in the more general situation might be called “least 
quadratics’^ 
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The Lagrange method of undetermined multipliers can be used to calculate 
the values of the adjustments to the coordinates and the values of the param- 
eters. The procedure is the same as for least squares [2], [3], [5], the only 
difference being the somewhat greater complication of the algebra. We shall 
summarize the development here. 

The condition equations, supposed v in number, may be written 

f*{xn , • • • , : Pi , pt , • • • , Pr) = 0, A = 1, 2, . . . , V, 

where each F* may in general involve any or all of the niunbers Xji as well m 
any or all of the parameters pi , whose number we suppose to be r. Let 

(3) « dF^/dXji , Fj = BF^/dpi 

where the X^q have been substituted for the x's after differentiation, and each 
Pi hajs been replaced by the best available approximate value pio . Let Fo be 
the value of after the same substitution. Also let Vi ^ pn — pi . Then if 
the V^8 and are small the conditions may be written 

(4) = K, fc=l,2,...,r. 

i j I 

Differentiate Q with respect to the F's and equate the restdt to zero, eliminat- 
ing the factor 2. DiiBferentiate (4) with respect to the F's and the v% multiply 
each equation by the corresponding undetermined multiplier — X/k , and sum 
the results together with the result from differentiating Q, Collecting coeffi- 
cients of the differentials 6F,,- and dvi , equating to zero and transposing the 
terms involving \h , we get 

WiiiVu + WniVu + • • • + y^iqiV qi = [XaF}<] 

(5) W2uVu + Wi2iV2i + • • • + WtqiV qi = [X^FJi] (t = 1, 2, • • • , Ti) 

y^qxiVxi "f" 2% ”f" * * * "f" ^<fa»Fg» “ [Xj^F^f] 

(6) [XjiFt] = 0 /=1,2, 

where the brackets denote summation with respect to h. 

Equations (5) can be written down easily, since the coefficients w^ki appear 
in the same order as in (2). The equations corresponding to each i form a 
complete set which can be solved independently of those for other values of t. 
The solution can be expressed 

Vji = Ai/i[XjkFid -h A2/»[X*Fad + A^y,[X*Fj*] 

where Ahn is (— times the minor corresponding to Wkn, divided by the 
principal determinant. By symmetry A*,< = Ajhi . 
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The V’b in (4) are to be replaced by their values from (7) and the ooeffidvits 
of the X’s collected. To facilitate this let 




i-l 


where 


Ljki — AniFtil^i, 

•-1 r-l 

Each Ljki can be written down easily from the corresponding Qi as written in 
(2) : in each term WrnVnVti replace w„i by A rti f V ri by FU , and by F*,i . 
It is important to preserve the order of the subscripts of the F’s in (2), and to 
treat the diagonal terms w„iVli as though written WtriVnYri . It is seen that 
Liki = Lkji , and L,* = Lt,- . Then the substitution from (7) into (4) gives 

(8) S Lih^j + 23 EjVj = Fo A *= 1, 2, • • • , i>. 

i-l l-i 

Equations (8), with (6), are formally identical with those of the least squares 
procedure which are called by Deming the “general normal equations”, and 
they can be written schematically in the same manner. The further procedure 
is identical with that for least squares, involving solution of the general normal 
equations for the X’s and u’s, substitution of the values of the X’s into (7) to 
obtain the F’s, and then adjustment of the observations by use of the F’s, 
and adjustment of the provisional values of the parameters by use of the v’s. 

A word of appreciation is due Dr. 0. W. Richards of The Spencer Lens Com- 
pany for calling this problem to my attention, and for encouragement in the 
carrsdng out of the solution. 
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THE BSTIMATKHf OF A QUOTIENT WHEN THE DENOMXNATOBl 
IS NORMALLY DISTRIBUTED 

Bt Robert D. Gordon 

Scrippa In^itvMon of Oceanography, La JoUa, Calif. 

1. Bitrodttction. In an oceanographic investigation we have to deal with a 
time series consisting of single pairs of observed values x, y, of two independent 
stochastic variables, whose true (mean) values we shall denote respectively by 
a, b. Of interest is the corresponding time series of quotients (b/a), which it 
is required to estimate from the observations x, y. Both x and y are approxi- 
mately normally distributed about their mean values a, b with rather large 
variances <rl , v* which can be estimated. It is easily possible for x to vanish 
or even to be of opposite sign to a, although a cannot itself vanish. The re- 
quired estimates of (b/a) should have the property that they can be numerically 
integrated, i.e. that an arbitrary sum of such estimates shall equal the corre- 
sponding estimate of the true sum. 

Let us define a function y(x) to have the property that its mathematical ex- 
pectation E(y(x)} is exactly 1/a, where a = E(x). If such a function exists 
we shall have 

(1) E{y-y(x)} = E(y)^E{y(x)] - b-(l/a) = b/a 

so that y-y{x) will be an estimate of b/a which has the required property: 
namely such estimates can be added, and we have 

E{yM!>^i) + ytr(xt)\ = E{y{)t(xi)} H- E{yMxa)} = bi/oi + 

as required. It turns out that if x is normally distributed with non-sero mean 
such a function ^(x) does exist, and is given by the formula 

(2) y(x) - - exp (xV2ol) f" e'"'* d< - 1 

where Ru is the “ratio of the area to the bounding ordinate” which is tabulated 
by J. P. Mills,’ also in Pearson’s tables.^ Equation (2) holds if a is positive; if 
o is negative the integration should extend over (x/<r, , — It is easy to 
verify that 

(3) E(y(x)) = r y(x) exp ( - dx « ^ 

by direct substitution from (2). 

^ J. P. Mills, ‘‘Table of ratio: area to bounding ordinate, for any portion of the normal 
curve, “ Biometrikaf Vol. 18 (1926), pp. 395-400. 

* Karl Pearson, Tables for Statisticians and BiometridanSf part 11, table 111, Cambridge 
Univ. Press. 
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2. The law of large ntunbers tor y(x). The function y{x) defined by (2) has 
mean value 1/a as required, but its second moment (hence variance) does not 
exist, as may readily be verified. By a theorem of Khinchine* however, its 
values satisfy a law of large numbers. It will be of interest to inquire about the 
“strength^^ of this law of large numbers for y{x). Namely, given a positive 
number e, how many ‘‘observations’^ (independent estimates) y{x) will suflSce 
to guarantee probabilities of .50, .90, .95, etc. for the following inequality to 
hold 


(4) 7fa) + 7fa) + - > + vfan) _ 1 I ^ ^ 

n a I 

where n is the number of “observations.” 

In order to arrive at a rough answer to this question we have made use of 
certain inequalities due to Tshebysheff (Tshebysheff’s “method of moments”, 
cf. Uspensky^). Let u be an arbitrary stochastic variable whose distribution 
has moments of the first and second order which are known. Denote by m its 
first moment, by a its variance, then it results from Tshebysheff’s theory that 
the probability P{ui , ?isi) for a value of u to lie between Ui and (i.e. ^ w ^ 

satisfies the inequality 


( 6 ) 


P(wi, 02) > 1 - 


(ui — m)* + (Uf — m)* + a* ' 


This inequality is independent of the values, or even the existence, of further 
moments of the ^-distribution beyond the second, and depends only on the 
condition that the cumulant of the distribution function shall have at least three 
“points of increase.” 

Although y(x) does not have a second moment, a second moment does exist 
for those values of yix) which correspond tox^ — B > — 00 , where 6 is an 
arbitrary number, positive or negative. If we can estimate the first two mo- 
ments of y{x) 1/x corresponding to a given value of B, then for a given number 
n of observations we need only to divide the corresponding variance by n to 
obtain in (5), then multiply (5) by the nth power of the (normal) probability 
for the inequality x ^ in order to obtain a lower bound for the probability 
of the inequality (4). B is to be determined so as to yield a maximum result. 

The first moment mi of y(x) for values of x ^ ^ is easily computed, and is 

given by the formula 

( 6 ) 


* J. V. Uspensky, Introduction to Mathematical Probability ^ pp. 195, McGraw-Hill (1937). 
^ J. V. Uspensky, l.c. pp. 365 ff. 
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The second moment is harder to compute, but if we place 

^(fi) « - mj) « [^(a!)]* ejq) ^ dx 

where 


L- fexpf-tLr^V 

2ir «r, “L* \ 2ff\ / 


we ea^y obtain the relationship 
1 


-1- r 




dt 




(»fa)/».// * 



From (7), using a table of the probability integral, it can be verified that 
0(— o — 30 0.001. Assume, therefore, as a boundary condition ^(— o — 

30 = 0 then (8) can be integrated graphically or numerically. It is by this 
means that the curves shown in Figs. 1 and 2 were determined. Computations 
were also attempted for o/ir, = i, a/<r, = 1, but it was not possible to obtain 
significant results: it would be necessary in these cases to take more than two 
moments into account, which would lead to hopeless complications. In these 
figures the ordinates represent probabilities for an observation to fall between 
.90a and 1.11a (Figure 1), and tetween .76a and l.SSa (Figure 2), respectively. 
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3. Two practical formulas for computations. It seems worthwhile to note 
here two simple formulas in connection with Mills' ratio (2) which will be useful 
for computations. The first is the obvious relationship 

(9) -Ru = 1/r - Ru 

in the notation of Pearson’s tables. The second applies to large values of x, 
and may be written 

( 10 ) = --R.„.<'^ 

X + ffx 

(10) is true for x > 0, and can be proved by means of the differential equation 
which y{x) satisfies. 

4. Remarks. The estimate y(x) has the following inadequacy : If only a single 
observation x is known, then it is unknown whether a is of like or unlike sign 
compared to x. It turns out then that the mathematical expectation for the 
value of y(x) vanishes identically. This diflSculty of course disappears if more 
than one observation is available. Methods of avoiding this diflSculty for time 
series, e.g. by noting relative frequencies for observations separated by 1, 2, 3 
etc. intervals to agree in sign, will be discussed elsewhere in connection with 
practical applications. 

It may be worthwhile to note that Geary^ developed certain characteristics 
of the distribution of a quotient, which however are not adapted to our purposes. 


NOTE ON CONFIDENCE LIMITS FOR CONTINUOUS DISTRIBUTION 

FUNCTIONS 

By a. Wald* and J. Wolfowitz 

In a recent paper [1 ] we discussed the following problem : Let X be a stochastic 
variable with the cumulative distribution function /(x), about which nothing is 
known except that it is continuous. I^t Xi , • • • , Xn be n independent, random 
observations on X. The question is to give confidence limits for /(x). We 
gave a theoretical solution when the confidence set is a particularly simple and 
important one, a “belt.^^ 

A particularly simple and expedient way from the practical point of view is 
to construct these belts of uniform thickness ([1], p. 115, equation 50). If the 
appropriate tables, as mentioned in our paper, were available, the construction 
of confidence limits, no matter how large the size of the sample, would be im- 
mediate. 

Our formulas (11), (16), (19), (27) and (29) are not very practical for computa- 
tion, particularly when the samples are large. We have recently learned that 

* Geary, R. C., **The Frequency Distribution of a Quotient,*' Jour, Roy, Siat, Soc„ 
Vol. 93 (1930), pp. 442-446. 

’"Columbia University, New York City. 
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there exists a result by Kolmogoroff [2], generalised by Smirnoff [3]/ which for 
large samples gives an easy method for constructing tables, i.e. ci finding a 
when c and n are given (aU notations as in [1]). The result of Kolm<^roff- 
Smimoff is: 

Letc“V\/n. Then for any fixed X > 0, 
lim P = lim P » 1 - 

lim P - 1 - 2 E (-l)‘*-« e"*"*^* . 

n«N0 nml 

This series converges very rapidly. 

REFERENCES 

[1] Wau> and WoLFOViTz, “Confidence limits for continuous distribution functions,” 

Annals of Math. Slat., Vol. 10(1939), pp. 105-118. 

[2] A. Kouioooboff, “Sulla determinazione empirica di una leggi di distribuzione,” 

Oiorndle delVInttitulo Itdiam degli Attuari, Vol. 11(1933). 

[3J N. SiiiBNOFF, “Sur les ecarts de la courbe de distribution empirique,” Beeueil Maiho- 
matique {Mathematiehegki Sbornik), New series, Vol. 6(48) (1939), pp. 3-26. 

‘In the French r68um6 of Smirnoff’s article, on page 26, due to a typographical error 
this formula is given with a factor (-1)** instead of the correct factor (-1)’"“*. The 
correct result follows from equation (112), page 23, of the Russian text when ( is set equal 
to zero. 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 

The Sixth Annual Meeting of the Institute of Mathematical Statistics was 
held at the Stevens Hotel, Chicago, Thursday to Saturday, December 26 to 28, 
1940 in conjunction with the meetings of the American Statistical Association, 
the Econometric Society, and the American Marketing Association. The fol- 
lowing fifty members of the Institute attended the meeting: 

H. E. Arnold, C. S. Barrett, A. G. Brooks, R. W. Burgess, A. G. Clark, A. C. Cohen, Jr., 
W. G. Cochran, A. T. Craig, C. C. Craig, B. B. Day, W. E. Deming, J. L. Doob, P. S. Dwyer, 
Churchill Eisenhart, J. W. Fertig, P. G. Fox, Hilda Geiringer, E. J. Gumbel, Myron Heid- 
ingsfield, Harold Hotelling, Leo Katz, J. F. Kenney, L. F. Knudscn, Alma Kohl, T. Koop- 
mans, D. H. Leavens, Ida Levin, G. A. Lundberg, 8. N. Lyttle, W. G. Madow, Ralph 
Mansfield, G. F. T. Mayer, J. R. Miner, E. C. Molina, C. R. Mummery, J. I. Northarn, 
E. G. Olds, P. 8. Olmstead, A. L. O’Toole, J. A. Pierce, Wilhelm Reitz, P. R. Rider, M. M. 
Sandomirc, L. W. 8haw, W. A. 8hewhart, F. F. Stephan, 8. A. Stouffer, A. G. Swanson, 
8. 8. Wilks, M. 0. Woodbury. 

The opening session, on Thursday afternoon, was devoted to contributed 
papers in probability and statistical methodology. The Chairman was Professor 
S. S. Wilks of Princeton University, and the following papers were presented: 

1. On the Calculation of the Probability Integral on Non-Central t and an Application. 

C. C. Craig, University of Michigan. 

2. Effective Methods of Graduation. 

Max Sasuly, Office of the Actuary, Social Security Board. 

3. On Some New Results in the Sampling of Discrete Random Variables. 

William G. Madow, Bureau of the Census. 

4. On the Use of Inverse Probability in Sample Inspection. 

W. Edwards Deming and W. G. Madow, Bureau of the Census. 

5. On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table when 

Some of the Marginal Totals are Known. 

F. F. Stephan, Cornell University, and W. Edwards Deming, Bureau of the Census. 

6. The Return Period of Flood Flows. 

E. J. Gumbel, New School for Social Research, New York City. 

7. A Note on the Power of a Sign Test. 

W. M. Stewart, University of Wisconsin. 

8. A New Explanation of Non-Normal Dispersion. 

Hilda Geiringer, Bryn Mawr College. 

Abstracts of these papers follow this report. 

On Friday morning a session was held jointly with the American Marketing 
Association on The Theory and Application of Representative Sampling. Under 
the chairmanship of Professor Theodore H. Brown of Harvard University, the 
following papers were presented: 

1. Background and Method. 

F. F. Stephan, Cornell University. 
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2. Application to Marketing Probleme. 

Archibald M. Croasley, New York City. 

3. Application to Agricultural Problems, 

Arnold J. King, Iowa State College. 

The afternoon session on Friday was held jointly with the American Statis- 
tical Association and Econometric Society on The AnoXyeie of Variance. The 
chair was held by Professor P. R. Rider of Washington University and the fol- 
lowing papers were presented: 

1. The Relation Between the Design of an Experiment and the Analysis of Variance, 

A. E. Brandt, Soil Conservation ^rvice. 

2. The Underlying Principles of the Analysis of Variance and Associated Tests of 
Significance. 

Churchill Eisenhart, University of Wisconsin. 

3. The Applications of the Analysis of Variance to NonrOrthogonal Data, 

W. G. Cochran, Iowa State College. 

Discussion: 

Gertrude M. Cox, North Carolina State College. 

John F. Kenney, University of Wisconsin Extension Division. 

W. Edwards Deming, Bureau of the Census. 

On Saturday itiorning and afternoon, sessions were held with the American 
Statistical Association on Collection and Use of Statistics for Quality Control in 
National Defense Industries, At the morning session the following papers were 
given, with Dr. C. W. Gates of the Western Electric Company in the chair: 

1. Report on the Quality Control Program of the American Standards Association. 

John Gaillard, Western Electric Company. 

2. Sample Verification in the Administration of the Population Census. 

W. Edwards Deming, Bureau of the Census. 

3. The Importance of the Statistical Viewpoint in High Production Manufacturing. 

P. L. Alger, General Electric Company. (Read by C. Eisenhart.) 

4. On the Initiation of Statistical Methods for Quality Control in Industry. 

Leslie E. Simon, Aberdeen Proving Ground. 

At the afternoon session the following papers were presented under the chair- 
manship of Dr. John Johnston of the United States Steel Corporation: 

1. The Place of Statistical Analysis in Ferrous Metallurgy. 

E. M. Schrock, Jones and Laughlin Steel Corporation. 

2. Statistical Methods in the Production and Inspection of Cast Iron Pipe. 

J. T. MacKenzie, American Cast Iron Pipe Company. 

3. Applications of Statistical Methods to Metallurgy. 

R. B. Mears, Aluminum Company of America. 

Discussion: 

Churchill Eisenhart, University of Wisconsin. 

The annual business meeting of the Institute was held on Thursday afternoon 
after the session on probability and statistical methodology, with the President 
presiding. 

The Secretary-Treasurer read the financial report for 1940. 
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The Editor of the Annals of Mathematical Statistics reported on the progress 
of the Annals during 1940. It was stated that manuscripts worthy of publica- 
tion were now being submitted at a rate that would justify the publication of 
a 500-page annual volume. To make this amount of publication self-supporting 
upon the expiration of the Rockefeller grant in June, 1941, it was pointed out 
that another 160 new subscriptions would have to be obtained during 1941. 
Judging from the rate at which subscriptions had been coming in during the 
past two years such an increase was considered entirely feasible with the coopera- 
tion of the members of the Institute. Various methods of effecting this increase 
were discussed at the meeting and suggested for the consideration of the Board 
of Directors. 

On behalf of the Board of Directors the President made the following report: 

1. The Report of the War Preparedness Committee, approved in preliminary 
form at the Hanover meeting, had been preprinted and some of the preprints 
had already been distributed. 

2. Arrangements had been made with the Executive Officer of the National 
Roster of Scientific and Specialized Personnel to send the statistics check list 
to all members of the Institute who are not members of the American Statistical 
Association. 

3. That preprints of the pamphlet on The Teaching of Statistics, including an 
address by Professor Harold Hotelling, discussion by Dr. W. E. Doming and 
the resolutions on the teaching of statistics adopted by the Institute at the 
Dartmouth meeting had been produced and distributed. 

4. That application^ had been made to the Executive Committee of the Ameri- 
can Association for the Advancement of Science through the Permanent Secre- 
tary for admission to the status of an affiliated society in the Association. 

It was announced that through the annual election, carried out by mail ballot, 
the following officers were elected for 1941 (all names being those proposed by 
the Nominating Committee) : 

President: Professor Harold Hotelling 

Vice-Presidents: Professor A. T. Craig 
Professor H. C. Carver 

Secretary-Treasurer: Professor E. G. Olds 

The annual luncheon was held at noon on Friday with the President-Elect 
presiding. Short talks were made by Dr. E. J. Gumbel, Dr. T. Koopmans and 
Professor S. S. Wilks, while the annual luncheon address was delivered by 
Professor P. R. Rider. 

P. R. Rider, 
Secretary-Treasurer, 

‘ This application was approved by the Executive Committee of the A.A.A.S. at its 
December 1940 meeting. 
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(Presented on December 26 , 1940, at the Chicago meeting of the Institute) 

On the Calculation of the Probability Integral on Non-Central t and an Appli- 
cation. C. C. Craig, University of Michigan. 

It seems not to have been noted that the probability integral for non-central t can be 
calculated by means of an infinite series in incomplete i9-f unctions which converges rapidly 
for small samples. The application here considered is to a test based on the randomization 
principle which is the subject of E. J. G. Pitman’s paper: Significance teete which may be 
applied to samples from any populations {Roy. Stat. Soc. Jour.^ Vol. 4 (1937), pp. 119-130). 
In case the samples come from normal populations with equal variance but with unequal 
means, the chance that the hypothesis of equal population means will be accepted on this 
test is given by this probability integral which is evaluated in some illustrative numerical 
examples. 

On Some New Results in tiie Sampling of Discrete Random Variables. Wil- 
liam G. Madow, Bureau of the Census. 

Many statistical tables may be regarded as the result of subsampling finite populations 
classified into r X « X • • • tables. The main aim of this paper is to derive the associated 
statistical theory including both the finite and limiting distributions. After evaluating 
the fundamental distributions and the moments it is shown that under certain conditions, 
the limiting distribution is multinomial, while under other conditions the limiting distribu- 
tion is multivariate normal. These results are then applied to determine the adequate size 
of sample, and the sampling proportions from various strata. 

On the Use of Inverse Probability in Sample Inspection. W. Edwards Dem- 
iNG and William G. Madow, Bureau of the Census. 

The theory of inspection by sampling is abstractly equivalent to one part of the theory 
of subsampling. The theory of subsampling finite populations is considered in this paper 
in order to investigate the differences that occur when the methods of fiducial inference and 
inverse probability are used, particularly in regard to determining the adequate size of 
sample. In sample inspection, the prior distribution of failures is almost always known, 
at least approximately. In using any system of sample inspection, a number of failures will 
pass undetected. On the basis of certain prior distributions of failures, distributions are 
derived for the number and percent of failures remaining after each of several different 
possible systems of sample inspection has been applied. Formulas giving the cost of partial 
inspection are used together with these distributions in order to determine methods of 
sample inspection having various desired properties. 

On a Convergent Iterative Procedure for Adjusting a Sample Frequency Table 
When Some of the Marginal Totals are Known. Frederick F. Stephan, 
Cornell University and W. Edwards Deming, Bureau of the Census. 

The 5 per cent sample taken with the 1940 Population Census presents an interesting 
problem of estimation in which the estimates are connected by equations of condition. 
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These equations arise from the fact that certain sums of estimates derived from the sample 
should equal the corresponding frequencies derived from the tabulations of the census 
enumeration, i.e. the distribution of each of several variables may be known but their 
joint distribution may only be estimated from a cross tabulation of the data furnished by 
the sample. The adjustment of the sample estimates is accomplished by the principle of 
least squares and an outline of the various types of conditions for two and three variables 
is presented. The solution of the normal and condition equations is tedious when hundreds 
of sets of estimates must be adjusted but a simple iterative procedure is available (see 
Annah of Math. Stal.^ Vol. 11 (1940), pp. 427-444). 

The Return Period of Flood Flows. E. J. Gumbel, New School for Social 
Research (N. Y.) 

For any statistical variable the return period is defined as the mean number of trials 
necessary in order that a certain value of the variable or a greater one returns. The return 
period is a theoretical statistical function such as the distribution or the probability. In 
hydraulics the corresponding observed values are the recurrence and exceedance intervals. 

The main thesis is that the flood flows are the largest values of flows which have to be con> 
sidored as unlimited variables. The method of return periods applied to the largest values 
leads without further assumptions to a formula which gives the return period /(x) of a flood 
superior to a;, and at the same time the most probable flood to be reached not at a certain 
time, but within a certain period. This formula contains only two constants, which are 
linear functions of the mean annual flood and the standard deviation. Fuller’s formula 
turns out to be an asymptotic expression of my formula. 

This method applied to the Connecticut, Columbia, Merrimack, Cumberland, Tennessee 
and Mississippi rivers shows a very good fit between theory and observation, superior to 
the methods applied heretofore. 

A Note on the Power of the Sign Test. W. M. Stewart, University of Wis- 
consin. 

Let us consider a set of N non-zero differences, of which x are positive and N ^ x are 
negative; and suppose that the hypothesis tested, Ho , implies in independent sampling 
that X will be distributed about an expected value of N/2 in accordance with the binomial 
(i 4- i)^. As a quick test of //o , we may choose to test the hypothesis that x has the 
above probability distribution. Defining r to be the smaller of x and N — x, the test con- 
sists in rejecting ho and therefore Ho whenever r < r(e, N), where r(€, H) is determined by 
N and the significance level «. 

In applying such a test it is of interest to know how frequently it will lead to a rejection 
of Ho when Ho is false and the actual situation H implies that the probability law of x is 
{q -f p)^, with p 3^ i, thereby indicating an expectation of an unequal number of -h and — 
differences. The probability of rejecting H© when Hi implying p * pi is true, is termed the 
power of the test of Ho relative to the alternative Hi . 

A table is given for the 5% significance level (e « .05) showing the minimum value of 
N for which the power of the test relative to p — pi exceeds for values of /3 from .05 to .95 
at intervals of .05; and for pi from .60 to .95 (and thereby for pi from .40 to .05) at intervals 
of .05. The case of /S > .99 is also considered for these values of pi . 

A New Explanation of Non-Normal Dispersion. Hilda Geirinoer, Bryn 
Mawr College. 

The starting point of the Lexis theory consists in this fact: It is to be expected, on the 
average, that two expressions S and X' which can be computed from the results of m*n 
observations are equal, provided that the corresponding m*n chance variables x,tp are 
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equally and independently diatributed. Let be the average » 1/n ^ and a the av- 
erage of the an (p >■ , m). Then 

£ 2 (v ” «)’ 

y> . ”» _ ”» » > 

^ mn — 1 mn — 1 mn 

21 (o^ - o)» 

y !1_ M . 

" m — 1 m — 1 m 

We see, however, that rows and columns do not play the same role here because Z depends 
only on the , the average values of the rows. If the observed value of Z happens to be 
larger (smaller) than the value of Z', we speak of supernormal (subnormal) dispersion. 
It is well known that supernormal dispersion can be explained by assuming that the m*n 
theoretical populations are only equal “by rows” but not by columns (there are m different 
distributions) ; in the same way one can explain the case of subnormal dispersion by admit- 
ting that the distributions are equal “by columns,” but not by rows. 

Another explanation which may sometimes seem more plausible is the following: All 
the m-n distributions are supposed to be equal, but we omil the asBuinpiion of mutml in- 
dependence. Then one can prove that the supernormal or subnormal dispersion corresponds 
respectively to an appropriately defined “positive” or “negative correlation,” The fact 
that normal dispersion occurs rather rarely in social questions is then reflected by the idea 
that social phenomena are in fact not independent of each other but are usually only as- 
sumed BO for the purpose of simplicity. In that way the more frequent occurrence of 
supernormal dispersion likewise finds an adequate explanation. 




THE CYCUC EFFECTS OF LINEAR GRADUATIONS PERSISTING IN 
THE DIFFERENCES OF THE GRADUATED VALUES 

By Edward L. Dodd 
University of Texas 

1. Scope of inquiry. Slutzky [1] applied the moving sum, the repeated 
moving sum, and other linear processes to random numbers obtained from 
lottery drawings. But the graph of the moving sum becomes, when the vertical 
scale is changed in the ratio of n to 1, the graph of the moving average^ the simplest 
form of graduation. When cyclic effects are studied, there is no essential differ- 
ence between a moving sum and a moving average, nor between a general linear 
process with coefficients Ui , Oj , • • • , a, , having sum A ^ 0 and the corre- 
sponding graduation, with coefficients a[ = ai/ A. Thus Slutzky 's work throws 
considerable light upon graduation, although his main interest was in summation. 

Slutzky found that the graphs of moving sums of random numbers bore 
strong resemblance to graphs of economic phenomena, such as [1, p. 110] that 
of English business cycles from 1855 to 1877. In fact, Slutzky regards the 
fluctuations in economic phenomena as due largely to a synthesizing of random 
causes. 

In general the undulatory character of such values cannot be described as 
periodic; since the waves are of different length. But Slutzky found that, upon 
operating on random data having mean zero and constant variance, the resulting 
values approach a sinusoidal limit under certain conditions, — in particular, when 
a set of n summations by twos is followed by m differencings, and as w , 
m/n — ► a constant. Romanovsky [2] generalized this result by taking successive 
summations of s consecutive elements of the data, with » 2; but required that 

m/n —► a ^ 1. However^ the cases which are of interest to me just now are 
those for which m = n— lorm = 7i — 2; and for these cases m/n 1. Ro- 
manovsky considers the case of m = n — 1, — not, however, as leading to a 
sinusoidal limit, — and gives in formula (46) the value of a coefficient of correla- 
tion — which I deduce directly. From his formula (43) a corresponding coeffi- 
cient of correlation can be obtained for the case oi m ^ n — 2, as the sum of 
certain products. A more simple expression than this I need, which I obtain 
directly. In my treatment, these coefficients are the cosines of angles; and the 
ratio of such an angle to a whole revolution is an expected frequency of 
occurrence. 

After setting forth in Section 2 some preliminary formulas, I treat in Section 3 
the results of appl 3 ring to random data an indefinite number A; + 2 of summa- 
tions or averagings, followed by k differencings — the number of terms in a sum 
remaining fixed. In Section 4, however, only a few differencings are applied to a 
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graduation. In particular the Spencer 21-term formula is studied in some 
detail. In former papers [3, 4] I have dealt \vdth the immediate effects of 
graduations upon random data. 

The question to be considered in this paper is this : Do the cyclic effects appear- 
ing in the graduated values persist in the successive differences? Andy if so, do 
these affects fade out gradvxilly or on the other handy do they come to a rather abrupt 
termination? 

These differences of graduated values, indeed, up to the third, fourth or fifth 
are of considerable importance. Henderson [5] defines the smoothing coefficient 
of a given graduation as the ratio of the theoretical standard deviation of the 
third differences for the graduated values to that for the original values or data. 

2. Preliminary notions and formtilas. The data to be graduated will be sup- 
posed to be independent, or uncorrelated, or as Slutzky expresses it, “inco- 
herent.^’ This will imply that the expected value of the product of two different 
chance variates is the product of their expected values. 

Now the operations of summing and differencing as used here* are not inverse. 
To illustrate: Given as independent w, v, ic, a:, t/, z, • • • . Summing by twos 
yields the sequence it + r + ic, tc + x, a* + 2 /? 2 / + 2 , • • • . But the first 
differences of these numbers, w — u, x — v, y — Wy z — Xy • * • are alternately 
correlated, thus w — uin negatively correlated with y — w; x — v with z — x, 
etc. Indeed, successive differencing following successive summing does not lead 
back to the original condition of incoherency. However, under certain condi- 
tions, the resulting coherency may be so slight that the final succession of num- 
bers may have just about the same chaotic proj^rties as the succession of data. 

In my paper [3, p. 262], I set forth a number of features on the basis of which 
a cycle length could be defined. One of these involves the frequency of maxima. 
Given independent chance variables, each subject to the same law of distri- 
bution, 

(1) P(xi ^ x) = 4'(x); 

where 4>(x) has a derivative 0(x). It is then easy to see that the expected rela- 
tive frequency of maxima is 1/3. That is: 

(2) P(Xi-i g X.- S Xi+i) = f («>(x)]V(x) dx = 1/3. 

NoWy for a given feature^ a cycle length is defined as the reciprocal of the theoretic 
relative frequency. Then the cycle length here for maxima is three. It is well 
known that averaging tends to remove maxima. Thus, upon averaging or 
summing, the cycle length tends to increase. It is almost as well known that 
differencing tends to increase the frequency of maxima, and thus decrease cycle 
length. For if Zi = Ayi = 2 /,+i — yi , then between two maxima of yi , there is 
at least one minimum (strong and weak) of yi ; and following this minimum and 
before passing the next maximum of yi there is at least one maximum of z,*. Suc- 
cessive differencing tends to reduce the cycle length of maxima from 3 to 2, 
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that is to make the graph a perfect zig-zag where positive and negative values 
of Zi alternate. A set of differencings following a set of summings may bring 
the cycle length from some fairly large number back to about 3, and thus restore 
something like the original chaotic appearance in the graph. 

In dealing with the foregoing #(a:) or in (2), it was not assumed that the 
distribution be normal. But, in what follows, it will be assumed that 


(3) 


<l>(x) = 


1 






and, for convenience, m will be taken as zero — that is, the data will be supposed 
given as deviations from their theoretic mean. Actually, the data used by 
Slutzky and the data I have used belong to a rectangular distribution, as noted 
in my former paper. Nevertheless the close agreement between actual and ex- 
pected results seems to indicate [3, p. 263] that the theory is in general applicable. 
It is well known that averaging of observations from non-normal distributions 
may lead rather quickly to an approximately normal distribution. 

Given n real numbers, ai , 02 , • • • , On , let 


(4) Vi = aiXi + (hXi^i + • ‘ + anXi^n~i; i = 1, 2, 3, 

Then is the moving sum if each Or = 1. Slutzky takes j t or j = i n — 1. 
Again, is the moving average if each ar = 1/n. For graduation in general, 
the condition = 1 is imposed; and usually j = ? + (n + l)/2. If n is odd, 
yj is thus associated with the middle x. 

Under the assumption that the are independent and normally distributed 
about mean zero, with constant variance, I have proven [3, p. 256]: The proba- 
bility that for any specified j, yj^i < 0, and yj > 0 is given by P = d/360®, 
where 


(5) 


cos 6 


n-l 

= arOr^l 

rwml 


/ r—n 

r 

r-l 



The expected relative frequenc}" of up-crossings of the graph of the through 
the zero base line is then d/360®. That is: d/360° is the expected relative fre- 
quency of a change in the sign of y from — to -f- ; also, of a change in sign 
from -f to — . 

But, as Ay, = yy+i — y, , it follows that 

(6) Ay, = b\Xi -f- h%Xi^i +•••-!- hnXi^n-^l + bn+lXi^n y 

where 


( 7 ) bi = ~ai , 5 n+l = On , br - ar-1 - Or , r = 2 , 3 , • • • , Tl — 1 

and since a maximum for the y’s at y* occurs when Ay,_i > 0, Ay< < 0, it follows 
that the theoretic frequency therefor is d'/360®, where 

n / n +1 

- EMr+l/Eb?. 

r-l / r-l 


( 8 ) 


COB 6 ' 
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In a similar manner, by using aecond differences, we get the expected relative 
frequency ^"/360® for inflexional points, in specified direction. Moreover, 
^ ^ S ^ ^ 180®; since inflections must be at least as frequent as 

maxima, etc. 

If the foregoing formulas are applied to the identical “graduation'^ y,- = Xj , 
then cos ^ = 0, cos = —1/2, cos == —2/3. In fact, 

(9) cos = — </(^ + 1). 

This follows from the fact that the fe's and similar coeflScients are the binomial 
coefficients; and 

(10) r ; z iC,. ,C.+1 = . 

r—0 r— 0 

Thus repeated differencing leads toward the perfect zig-zag. An extension of 
this feature will be taken up in the next section. 

3. Repeated summing and differencing. To indicate the reauU of the sum- 
ming of n consecutive numbers in a sequence, I shall use the notation 1". And 
the difference — y,- will be indicated by — 1, 0'"“^ 1. Thus if n = 3, 

1* and —1, 0^, 1 will stand respectively for 

( 11 ) yi = Xi^i + Xi + Xi^i ; Ayi = — + Oxi + + Xi+t . 

If, now, Zi = yi^i + yi -f , then 

(12) Zi == Xi ^2 + 2x*-i + 3a:,- + 2a :, -+1 + Xi ^2 • 

Since (w) is often used to indicate the operation of summing n consecutive num- 
bers, we may write 

(13) (3)* = 1, 2, 3, 2, 1; {nf = 1, 2, - • • , (n - 1), n, (n - 1), • • • , 2, 1. 
Then, for n >2, 

(14) Hnf = -1", 1"; A*(n)* = 1, 0""*, -2, 0"“, 1. 

And, since the operations of summing and differencing are commutative, we 
are lead to 

(15) K = (-l)*A*(n)* = *C, , 0"-*, , O-*, *C, , 0”-‘, • • • , (-1)*»C* ; 

as may be established by induction. For from the foregoing, it follows that 
(le) (- i)*A*(«)*+‘ = tCo" , , . . . , (- i)**c? . 

Then, since *+iCr = tC, + tC,_i , we conclude that 

(17) FV' = (-l)*+*(n)*+' = *+,C? , 0"-*, -w-iCr , 0"-’, . . . , (-l)*^ViC?+i . 
If now n ^ 2, then from (6) and (15) we find that 

(18) . cos = 0; 0/360® = 1/4. 
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Thus, the expected frequency of the changes in sign of A*(n)* is the same as 
that for the raw or ungraduated data. Moreover, if n ^ 3, (8) leads to cos 
= — 1/2, found for the data. For, in this case, at least two zero coefficients 
intervene between any two non-zero coefficients. And thus 

k i k 

(19) cos = -E / 2 Z « -1/2. 

In fact, the same factor cancels from numerator and denominator as we take 
higher differences, if a sufficient number of zeros intervene. More explicitly 
stated, the formula (9) found for the data is valid also for A*(w)*, provided 
^ ^ 2 . 

To make this more concrete, it may be noted that cycle lengths corresponding 
to < == 0, 1, 2, 3, and 4, are respectively 

(20) 4, 3, 2.73, 2.60, 2.52. 

From (15), we see directly that an element of A*(n)* is correlated only with 
certain other elements which are at distances from it which are multiples of n. 

Some of the foregoing results may be included in a theorem as follows: 
Theorem: Given a sequence of independent chance variates, each subject to the 
normal distribution (3) with mean zero. Upon this material, let k summings or 
averagings by n be performed and k differencings, in any order. Then the resulting 
sequence has something of the same chaotic nature as the data. In particular for 
n 2 the expected frequency of changes of sign is the same, — viz,, 1/4: for change 
from minus to plus aiid 1/4 for change from plus to minus. Moreover, as n is 
increased from 2 to Z, 4, b, •••, the expected frequency of other characteristics 
becomes the same, maxima and minima, points of inflection, etc,, in accordance 
with (9). 

But, suppose now that after A: + 1 summings by n, only k differencings are per- 
formed. Is the resulting sequence almost chaotic? Hardly so. At least, it 
can be shown that changes of sign in each direction have no longer an expected 
frequency fixed at 1/4; but this expected frequency decreases as n increases. 
To show this, formula (5) Is applied to (16); and setting in (10), C ~ 2 *^* , 
C' = %kCk^\ it follow\s that 

(21) cos 6 = [(n — 1)C — C']/nC = 1 — (2A; + l)/n{k + 1). 

Then cos d > 1 — 2/n; and the cycle length for expected changes of sign in 
definite direction is somewhat greater than that obtained by setting cos ^ = 
1 — 2/n. For values of n not too small, we may write cos ^ = 1 — tf*/2, ap- 
proximately; and then approximately 

(22) cycle length for definite change of sign in A*(n)*'‘’^ is wy/ n. 

If n = 9, this approximate length is 9.4, assuming k fairly large, whereas the 
more exact length is 9.2. 

Consider now the result of summing k + 2 times, and then differencing only k 



132 


EDWARD L. DODD 


times. For this puipose, a few formulas for summing squares will be useful. 
By the method of differences it can be shown that if i = o + wfc, and 

(23) T ~ G ? + (a Kf + (o + 2/i)* -f- • • • + (a + w — lA)^ + 
then 


(24) T = "I" aZ -f- ^^)/3 "h (Z — 

Suppose, now, that a/n takes on the values 0, kCo , — aCi , • • • , ( — 1)^*6^. in 
succession, while l/n takes on the values kCo , —kCi^ • • • , (—l)^kCk , 0. Let U 
be the sum of the (fc + 1) values of T thus obtained. Then by (10). 

(25) U = n\2 ,kCk - 2kCk-i)/S + n £ ,+iC?/6. 

/-o 


(26) 


_^n^(A; + 2)(2fc)! . n ^ 
3 T!(F+1)! 


Now, by applying to (16) one more summation by ??, ther(‘ are formed (k + 2) 
arithmetic progressions of (n + 1) terms each, alternately increasing and de- 
creasing. The maximum and minimum terms at the juncture of the progressions 
are to be split into two halves to apply (23). Then tlui sum of the squares of 
these coefficients is given by (26). This forms a denominator for (5). 

To obtain the numerator for (5) we note that from ah = [a^ (a — 6)^2 

it follows that if 


(27) V = a(a + A) 4- (a + h)(a + 2A) + • • • -f (a + /i — lA)(o + nh); 
then, from (23), 

(28) F = r - nAV3 = T - (Z - af/3n. 

If now W is the sum of such F^s, reference to the last terms of (24) and (26) 
shows that 


(29) 

And hence, from (5), 

(30) 

Then 

(31) 


IF = f/ - {n/3U^2Ck^i 


(A + 2)n" - 4fc - 2 
(k + “2)n^ + 2fc + "1 ' 


cos e > 


rC - 4. 
n2 + 2' 


but only slightly greater when k is large. Again 

(32) cos e > 1 - 6/n*; 

but only slightly greater when n is not small. In this case, cos B — \ 
approximately. And thus, approximately, for largo A, and for n not small 

(33) cycle length for definite change of sign of A*(n)*'^^ = 1.81 w. 

This gives for n == 10 a cycle length of 18.1 ; whereas, if cos 6 is taken as the 
right member of (31), the cycle length is 18.2. 

Thus, if a (A + 2)-fold summation or averaging of random data is followed 
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by only k dififerencings, the resulting graduation or linear processing z « A*(n)*^ 
is decidedly not as chaotic as the data; as seen from (31) and (33). But further, 
= A*'^^(n)*’‘^; and thus from (22) the cycle length for the expected maxima 
of 2 is about Ty/n. 

Now Slutzky [1, p. 109] distinguished conspicuous waves from inconsequential 
‘‘ripples.’’ On this basis, the frequency of significant cyclical features for a 
chance variable, such as z, would be less than the frequency of the maxima. It 
is not so clear that the frequency of significant features of a chance variable 
will be greater than that for changes of sign in definite direction. That turned 
out to be true for graduated values such as discussed in my earlier paper 
[3, p. 262]. If this be also valid for 2 , we would expect that conspicuous “waves” 
of A^Cn)*"^^ would have average length between ir\/n and l.Sln, except for small 
values of n and k. 

4. Graduations or linear processes and their successive differences. If double 
summation by n is followed by a single differencing, the result — as indicated in 
(14)“is, for n = 3, 

(34) yj = -Xi - Xi^i - Xi^2 + Xi^s + Xi+A + . 

Then 

(35) y ~ ^*4-4 ^»+6 *4" H" ^i4-7 “1“ • 

Thus yj and yj^ are negatively correlated; since Xi^ , Xi+ 4 , and Xi^^ appear 
in each, but with sign changed. This would seem to tend to make maxima 
alternates with minima at distances of about 3 ; or at distances of n, in the general 
case (14). Here, following Slutzky and Romanovsky, the coefficient of correla- 
tion Tp between elements at a distance of p is taken as 

(36) Tp = E{Xr-Xr+p)/EiXrf, 

Using computed averages, instead of expected ^values. Alter [6] recommends 
a “correlation periodogram,” in which rp is the ordinate for abscissa p. 

Moreover, wc would expect a graduation (4) with coeflBcients a* proportional 
to the ordinates y of the sinusoid y = sin (a + 2'wxjp) taken for a: = 1, 2, 3, • • • 
to impress upon random data oscillations with maxima separated from minima 
by about p/2. But such a, , as well as those in (34), have abrupt endings which 
introduce noticeable alterations. More satisfactory results come from tapering 
ends, such as appear in damped vibration, with coefficients about proportional 
to cos 2irx/p or to sin 2tx/p, H. Labrouste and Mrs. Labrouste [7] 
give a powerful operator of this description. 

Slutzky (loc. cit. pp. 119-123), Yule [8], and Walker [9] make use of damped 
harmonic vibration to explain the creation of cycles; while Bartels [10] ap- 
proaches by a different method the oscillations that do not last. 

Now the common graduation formulas have coefficients not conforming strictly 
to damped vibration, as the tapering ends vibrate more quickly. However, 
these ends have little more than a smoothing or stabilizing effect. Furthermore, 
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the coefficients for first differences are likely to conform to something like 
<?“**** sin 2irx/p, Some experimental evidence will be presented for the following 
conclusion: 

If the coefficients Oi of a graduation or linear process (4) appear to conform 
roughly to equidistant ordinates of a damped vibration^ ±e~®*** cos 2vx/p or 
dbej“**** sin 2wx/p, with changes of sign at intervals of p/2, then when this process 
(4) is applied to independent chance data having zero mean and constant variance^ 
there is a tendency for the graduated or processed values to change sign at intervals 
of about p/2. 

A number of standard graduations have first and second differences — see (6), 
(7) — which bear a decided resemblance to damped vibrations, while the third or 
fourth differences have only moderate, if any, cyclic appearance. This is espe- 
cially true of those graduations which are constructed by applying three sum- 
mings — the number of terms in a sum being in general different — and a fourth 


TABLE I 


Coefficients (XSSO) for Spencer 21-term graduation and for first four differences. 
Also theoretical cycle lengths for change in sign in values obtained from 

random data 


Grad. 

D. 


3'*" D. 


4*** D. 


Cycle 

Length 

+ (5,18,33,47,57,60,57,47,33,18,6 

-1,3,5,672 ■ '2,5,5,3,1 

+ 1,2, 2,0 3,10,14,16,12,8,3 

3,8,12,15,14,10,3 0,2,2,! ' 

+ 2, 3,6,4, 3 3,4,6,3,2 

-1,1,6 ' ‘ 1,4;7,6,7,4,1 67l,i 

+ 1,0 1,1,4, 3,3 1 2, 1,2,1 „ 

- ' 1,2, 1,2, ■ " r 3, 3 ,4, 1,1 “6,1 

+ 1,1,1 10 14 4 10 1 1,1,1 

-1 1 3 3 0 2 0 3 3 1 1 


process with negative coefficients. This is, indeed, a favorite form of gradua- 
tion, with which are associated the names of Woolhouse, Spencer, Higham, 
Kenchington, Henderson, etc. The Spencer 21-term formula, for which some 
features have already been described, [3, p. 262], will now be examined, with 
special reference to its differences. Cycle length for change of sign is one-half 
that for change from minus to plus. 

In the graduation formula, itself, there are 11 positive coefficients, centrally 
located, and relatively large as compared with the negative coefficients. This 
11 is close to 10.7 the theoretical cycle length for changes of sign of pr — 4.5, 
the difference between the graduated value pr and its mean — the arithmetic 
mean of 1, 2, • • . , 9. The structure of the first and second differences also 
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matches closely the corresponding cycle lengths. In the tiiird (hfferenoes, there 
is a break at the center; but still there appears considerable regularity. But 
among fourth differences, the rigsag is the prominent feature. Now the Uieorem 
of Section 3 does not really apply to the Spencer formula, with its two summa- 
tions by lives and one summation by sevens, and another process. But it is not 
surprising that the cyclicity ceases after passing the third differences. 

As a basis for comparing observed values with expected values, the tenth 
digits in the 600 logarithms from log 200 to log 799 were taken as a random set 
of numbers. These 600 numbers had been given a Spencer 21-term graduation 
[3, pp. 261-262], yielding 580 graduated values. From these the 679 first differ- 
ences were found, the 578 second differences, etc. These numbers, 580, 579, • • • , 
were multiplied respectively by the expected relative frequences of change in 
sign of y, — 4.5, of Ay, , A®yr , etc., as found by use of (6), (8), and similar ex- 
pressions to form the following table. 

The most abrupt change in frequency or cycle length appears to occur in 
passing from third to fourth differences. In Table 1, this is seen in the configura- 


TABLE II 


Compariaon of expected changes of sign vntii observed changes for a Spencer Ill- 

term gradiuUion 


Graduated values — 4.5 

First differences 

Seciond differences 

Third differences 

Fourth differences 


Expected Number of 
Changes from — to + 


27.2 

41.3 
52.9 

90.4 
176.7 


Observed Number of 
Changes from — to + 


27 

42 

48 

74 

146 


tion of positive and negative terms, and in the drop from 3.2 to 1.6 in cycle 
length ; and in Table II in the corresponding increase in expected sign changes 
from 90.4 to 176.7. More spectacular is the increase in the number of zig- 
zags represented by — , +, — , -b. Among the third differences, there were 
found only 13 instances of four successive terms with signs as just indicated, 
whereas among fourth differences there were found 75 such instances. For 
random material, about 36 such zigzags would be expected — decidedly more than 
found among the third difference, and decidedly less than found among the 
fourth differences. 

The Spencer 21-tenn graduation appears to be fairly representative of com- 
monly used graduations as regards regularity or irregularity in the distribution 
of positive and negative coefficients among the differences. For graduations 
with a much larger number of terms, the alternation of sign in fourth differ- 
ences may not be so rapid, as, e.g. in the 35-term 5th degree parabolic gradua- 
tion which Macaulay [11] calls No. 18. On the other hand, for a formula with 
non-tapering ends, such as the 13-term formula which Macaulay gives [11, 



136 


EDWARD L. DODD 


p. 64], the coefficients appearing in the differences are more irregular, especially 
at the ends. While the Spencer formula is fairly representative, different for- 
mulas have distinguishing features. If it is desirable to form an idea of what a 
given formula will do to random data, a table like Table I can be constructed. 

5. Stmimary. When upon indeptmdent chance data, summing, averaging or 
some more general graduation process is used, the graduated values tend to 
assume a wavy configuration. These waves often seem to have a fair amount 
of regularity or cyclicity. The first differences usually, and often other differ- 
ences of the graduated values, are decidedly cyclic. But, as we go in turn to 
the higher differences, the cyclicity may weaken. Indeed there may be a return 
to something like randomness. And subsequent differencings may tend to set 
up zigzags. 

If {k + 2) successive summings by n have been perfoimed on independent 
chance data, with n not too small, say n ^ 5 -then k + 2 differencings will 
just about bring back the original chaotic or random condition. But with only 
A; or (A; + 1) differencings, a definite cyclicity remains, at least theoretically, in 
the expected values. 

In the case of the Spencer 21-term graduation, the coefficients for the suc- 
cessive differences indicate the appearance of cyclicity in first, second, and third 
differences. 
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ON THE DISTRIBUTION OF WILKS^ STATISTIC FOR TESTING THE 
INDEPENDENCE OF SEVERAL GROUPS OF VARUTES 

By a. Wald^ and IL J. Brookner^ 

Columbia University 

1. Introduction. We consider p variates Xi f X 2 f • • • f Xp which have a jdiht 

normal distribution. Let the variates be divided into k groups; group one con- 
taining Xiy X 2 f • • • , , group two containing Xpj+i , Xpi+ 2 , • • • , , etc. We 

are interested in testing the hypothesis that the set of all population correlation 
coefficients between any two variates which belong to different groups is zero. 

Wilks* has derived, by using the Ncyman-Pearson likelihood ratio criterion, a 
statistic based on N independent observ^ations on each variate with which on<^ 
may test this hypothesis. Let |lr,y|| be the matrix of sample correlation 
coefficients; Wilks’ statistic, X, is the ratio of the determinant of the p-rowed 
matrix of samplt» correlations to the product of the pi-rowed determinant of 
correlations of the variates of group one, the (P 2 — pi)-rowed determinant of 
correlations of the* second group, etc. That is 

kd 

1 1 • I I ••• I 1 

where 1 ra./j. | is the principal minor of | r,, | corresponding to the ith group. 

In order to use the test, the distribution function of X must be known. Wilks 
has shown that in certain cases the exact distribution is a simple elementary 
function; in other cases it is an elementary function, biit one which is rather 
unwieJdy and which does not lend itself readily to practical use. It is our 
purpose in this pajXM* (1) to show a method by which the exact distribution can 
be explicitly given as an elementary function for a certain class of groupings of 
the variat(?s, and (2) to give an expansion of the exact cumulative distribution 
function in an infinite series which is applicable to any grouping. 

2. The exact distribution of X. By the method to be described, the exact 
distribution of X can be found when the numbers of variates in the groups are 
such that there arc an odd number in at most one group. If the number of 
variates is small, say at most eight, the method will increase only slightly the 
list of distribution functions that Wilks gives in his paper. 

^ Research under a grant-in-aid of the Carnegie Corporation of New York. 

* S. S. Wilks, the independence of k sets of normally distributed statistical vari- 
ables/’ Econometricay Vol. 3 (1936), pp, 309 326. Other references to Wilks in this paper 
except where otherwise noted are to this publication. 
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For purposes of deriving the distribution of X we may Bwnjmp that E{x^ «= 
0, (u = 1, 2, • • • , p); that there are n « iV — 1 independent observations 
Xua (a = 1, 2, • • • , n) on each variate ; and that the sample covariance 

n 

between x* and x,- is given by We define v! (a function of u) 

a—l 

to be the total number of variables in all the groups which precede the group in 
which Xu lies. The complete theory is independent of the ordering of the groups 
and of the ordering of the variates within the groups; hence without loss of 
generality, we may assume that if any group contains an odd number of variates, 
it will be the last group, hence u' is always an even integer. 

p 

Wilks has shown that X is a product n Zu where each is distributed 
independently of the others, and that the distribution of is 


( 1 ) 


B(i(n-«+ l), «72i "■ 


Now let pa = log Zu , then the characteristic function of yu is 

= dzu 

B[i(w — w + 1), w /2] Jo 

— . ^ f -Kw-w-D+i/i ^ 

7T — I I* — zj aZu 

B[t(n — u + 1), w /2] Jo 


where t is a pure imaginary. It is known* that this integral, even with complex 
exponents, is the Beta-function so long as the real parts of both exponents are 
greater than minus one, so 


^ {A — — “U + 1) -b u' /2] 

- ■ -Btisr- 17+ 

= - u + 1) + Mrl Kn - « + i + «')] 

rTi(n -«+! + „') + t]-r[i(n - u + 1)1 ■ 

But here w' is always an even integer, hence by the well known recursion formula 
of the Gamma-function, which is valid for complex arguments excluding only 
negative integers 

^(0 = - M + 1) + - w -K 3) + t] 

• • • [i(n - M + «' _ 1) t]}-‘ 

where 


Cu — [i(w — tt -1- l)][i(n — u -f- 3)] • • • {J(n — u -|- u' — 1)], 

' See Whittaker and Watson, A Count in Modem Analytit, Fourth edition 1927, Chap. 12. 



WILIQS’ STATISTIC 


139 


Now set 


y « log X * yp,+i + y„+, + h y, 

and the characteristic function of y is 


^(0 "• n c*{(§(n — «+!) + <][i(n — tt + 3) + <] 

•"Pi+i 


,, • • • [J(» — w + ~ 1) + <]) 

From the characteristic function, we can obtain the distribution function, 
y(y), of y by the relation 

^ 

“ 2in Li» nU.+i ti(n - « + 1) + t] • • • [i(n -u + u'-l) + t] 


where 





The integration can be carried out by the method of residues; since y is always 
negative (the range of X is from 0 to 1), on a half circle with center at the origin 
in the negative half of the complex ^-plane, the integral of the function 
converges to zero as the radius of the circle becomes infinite. Since ^(0 is 
analytic except for a finite number of poles on the negative real axis, g{y) is Cn 
times the sum of the residues at these points. 

Now is of the form — . where P{t) is a polynomial in t as follows: 

P(t) 

suppose that the groups contain ri, r 2 , • • • , r* variables respectively, then let 
(kj + 1) be the number of these r^s which are greater than or equal to j; theti 

Pit) = [i(n - 2) + t]*‘[i(n - 3) + <]*>[J(n - 4) + <]**+“[i{n - 6) + 

[§(n - 6) + <]*•+*•+*« . . . [J(„ _ p + 1 ) + 

where 

j _ ff/2 if O’ is even 

- (a - l)/2 if «r is odd. 

Then 

tf(y; n, r,, • • . , r*) = 2 ^ [« + iin - a - 

a-1 dai car'’ 

where 


+ 1 + *«-» + • • • + • 



140 


A. WAliD AND R. J. BBOOKNEB 


It can be shown that is ^ 0 for a between 1 and p — 2. Thus we have 
0iy> Ti. , rt , • • • , Tk) and from it we can calculate /(X; ri , rj , • • • , r*). 

Suppose p = 8 and that the variables are divided into two groups of four each, 
then we will calculate the distribution function /(X; 4, 4). Now 

( 4 IS - Cn r" 

2« l.-« [Kn - 2) + m(n - 3) + <]Ii(n - 4) + 

.[Kn - 5) + - 6) + - 7) + t] 

and 


Then 

g(y; 4, 4) = 16cn^ 
Since 




90 


+ e" 


® ~W~ 3 ■ ~3 


dX 


]■ 


we have 


/(x;4,4,) = 


16c, 


[- 


p = logx, % = Y, 


j^i(n-4) SX*'"”^' 


30 


+ *\ I 




2 ' 30 

The cumulative distribution function is given by 


(X^ 


i(n-7) 


->)logx]. 


7,(4, 4) = Prob[X :$to;4,4] 

_ 16cn r 1 _ I®* „ 4(4n — 23)w , 14(4n — 13)ti;* 

3 ^ Ll5(n-7) n^6 3(n - 5)» 3(n - 4)» " 

, tc* w* / 2to . 2ic* I 

. w - 3 15(n - 2) U - 5 n - 4 j ”’J’ 

Wilks’ expression for the cumulative distribution function appears to be quite 
different, but if we substitute n = iV — 1 and use the relation 

PV^iN - 6; 4) = "d “ 

= i(« — 2)(n — 3)(n — 4)(n — 5) 

3m;*‘”-"' 3tc“"~'’ lo*'"-®'! 

Ln — 5 « — 4 n — 3 n — 2 J 
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it can be shown that the two formulas for the cumulative distribution are 
identical. 

In cases where v! is not always an even integer, the exact distribution func- 
tion of X can still be obtained using this method. However, in such a case, the 
gamma functions do not cancel out and the integrand has an infinitude of 
poles, so the function is expressed by an infinite series. We will use a different 
method to obtain an infinite series expansion. 


3. A series expansion of the cumulative distribution function. Let us put 

t; = —y, and let the density function of v be then from (2), we have 


A(t;) iv 


dn — /* ** 6** 7T ^ ^ 

Ttvi u— ri+i r[§(n — w + 1 + u') + i] 


Since v is a monotonic decreasing function of X, and since the critical region for 
testing the null hypothesis is given by the inequality X < Xo , then the critical 
region will be defined by > t^o , where ao is such that 



is equal to a chosen level of significance. 
Proposition 1. 


= hn{v)i{v) 

where ^(v) does not depend on n, and hn{v) = CnC”*''. 
Proof: Let 


t' ^ t + \{n — p). 


Then 

'pr r[^(^ — u + 1) -h V]di* 

2tz J_<op+4(n~p) u r[i(p ~ w + tt' + 1) + 

Now the area in the complex plane bounded by the vertical line through i(n — p), 
by the vertical line through the origin, and by arcs of a circle with center at the 
origin of arbitrary radius is one in wliich the integrand is everywhere regular. 
Furthermore, the integral along the arcs approaches zero as the radius of the 
circle approaches infinity, hence the integrals along the vertical line through 
J(n — p) and along the vertical axis are equal. Then we may write 

1 /*** V{i»+pt%) TT r[j(p — w + 1) + i^]dV 

C. 2vi V r[i(p - w + u' + 1) + n 

= 

Therefore 

Uv) = 
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Proposition 2. 


where we define 


80 that 


lim [ 
-••0 


' Cne~^*v^^dv 

W) 


r- E E'f 


r * §[rjri + rt(ri + r») + • • • + r»(ri + r* + • . • + r*_i)] 

hy'.u'. 

U 

Proof: Let 


then 


Hence 


but 



V 


* 



_ TT ri(n — w + 1 + w') 
" V ri(n - u + 1) 


and therefore 


== lim ^ (2y''* „ 1 

n-.« r§(n — tt + 1) \n/ 

by an application of the Stirling approximation. Therefore 

/ = n * 1 . 


We then write 





viLKB* e/tJirumc 
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hence 

(3) hiv) - 

PaoposmoN 3. For any positive integer s, 


lim |n**Prob {v > “ 0. 


Proof: Since v =» —log X, the inequality v > l/'s/n is equivalent to the in- 
equality X < e-^iV». Since X *= , the inequality X < implies that 


4»—Pl+l 

there exists at least one value of u for which 




Hence 


2 P(«« < > P(X < e~^^) = P(v > l/ViO. 

Hence in order to prove Proposition 3 we have only to show that for each u and 
any arbitrary positive integer a 


lim ^ 0. 


From (1) we have 
Pizu < 


r 




B[J(n - tt + 1); u'/2] 4 
Over the range of integration, we have Zu < c-V(p~pi)V» bo 


^i(-— i)(i _ 


Pizu < < 




)^/n r* 

1V/2]1 


B[|(n - M -H 1) 
B[§(» - « + 1) 

2^— lu~«— 1) /(j»-pi ) 


(1 - 


ii'-B[J(n — tt + l);uV2] 
It follows from the Stirling formula that 

v»'/* 


[1 — (1 — 


V ^/oi 1- ri(n - « + i)r(«72) /»V''* 

toy Mtc. - . + 1); .721 - to ^ y 

« r(tt72). 
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Since 


lim n*'*^^*c V«/2(i>-pi) as o 


and 

lim (1 — (1 — = 1, 


the proposition follows. 

Proposition 4. The function ^(v) of formula (3) can be expanded in a power 
series, i,e, 

yp(v) = ao + otiV + -|* * * • 


unth a finite radius of convergence. 

Proof; Wilks^ has considered the following integral equation: 


w^g(w)dw = CB^ 


T(bi + t) • r (62 + 0 • • 
T(ci + t)-T((^ + t) .. 


r(M- 1) 

tic, + 0’ 


where C == ^ ^ independent of t, and bi < Ci 

(t = 1, 2, • • • , g). Wilks has shown that the solution of the integral equation, 
g{w), is given by the following expression: 


g{w) = ~ ^ 






(4) 


where 


and 




|b2— Cl 


. . . vjl-i *■*•-■"* 
Jo 

X (1 - ... (1 - 

X [l - »i(l - j^l - {t»i + t)*(l - w,)} ^1 - 

X j^l — {t»i + t^(l — «i) + • • • 

+ - Di)(l - t;*) ... (1 - 




X dvi dv% • •• dvq^i 


fc = II 


r(c,) 


fJi r(6,)r(ci - hi) 


7< = C^/ = 23 




y-o 


^ S. S. Wilks, ^'Certain generalizations in the analysis of variance,’’ Biometrika, Vol. 24 
(1932), pp. 474-6. 
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the range of w being 0 ^ w ^ Wilks has furthermore shown that 

for w > 0 and 0 ^ g 1 (i = 1, 2, • • • , g — 1). 

We denote the left hand side of (5) by f<. lie factor (1 — can be 

expanded in a power series, i.e. 

(6) (1 - = (1 - 

= 1 + (c<+i — 6,)f,- + i(c<+i — bi){Ci+i — 6< + 1){^ + • • • 

with a radius of convergence equal to one. Since we will show shortly that for 
the choices we make for the Ws and c»^s, Ct+i > bi , then all coefficients in this 
last expansion are non-negative. Substituting this series expansion (6) in (4), 
and ordering it according to powers of (1 — w/B), the expression imder the inte- 
gral sign (in 4) becomes 

Oq{vi, Vi, • • • 

+ ^l(vi , • • • Vg^l) + ^2(vi, • • • , Vg^l) + • • • . 

This series is uniformly convergent over the domain defined by the inequalities 
0 ^ Vi g 1 (t = 1, 2, . • • , g — 1) and 1 1 — w/B | < 1. We can even say that 
(7) is uniformly convergent for | 1 — w/B | < 1 if we substitute for each $i 
the maximum of Si with respect to Vi , vj , • • • , v^-^i . Hence we may integrate 
the series (7) with respect to vi , va , • • • Vg^i term by term, i.e. 

J Jo ' I dva • • • dv^^i == <ro + ^1 — + . . . 

and the series (8) is uniformly convergent for ] 1 — w/B 1 < 1. The coefficients 
cTo , (Ti , . • • are non-negative. 

Tlie case of the X statistic which we are considering is a special case of this 
integral equation which we obtain by making the following substitutions: 

w ^ 5=1, u = r + pi, q ^ p -- p^ 

br = Kn - u + 1), Cr = i(n - + w' + 1), (r = 1, 2, . . • , p - pi) 

Note that then 

Cr+l - br = i[{u + 1)' - 1] ^ 0. 

Hence, according to (4) 

g(X) dK = - X)‘*'‘'"‘{ffo + <ri(l - X) + <r,(l - X’) + . . . } dX 

where the infinite series converges for | 1 — X | < 1. 

Now V = —log X, or X = hence 

hiv) du = j dp 


{oi + t>»(l — Tj) + • • • + f<(l ~ *'»)(! 


(1 - i 
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where the series {«o + «it» + «»«* + •••} is obtained from the series {ffo + 
0 - 1(1 — X) -+- • • • } by substituting for (1 — X) the Taylor expansion of (1 — e“*). 
The series { *0 + eii' + + • • • } has a finite radius of convergence.® 

Hence the function ^(r) can be written as 


iiv) = {e„ + e, „ + j 

— - — j can be 

expanded in a Taylor series around i; = 0, Proposition 4 is proved. 


4. Evaluation of the coefficients in the expansion of Let the series 

expansion of ^(t;) be 


\piv) = ao + otiV + + 


Then we have 


i 




r(r) 


(<*0 "b "b ottv^ "b • • •) dt) ^ 1 . 


Tit 

Now let V* — xv, then 
2 


Jo \nj r(r) V 


, 2aiv* , 4a2r’*‘ 

ao H r 

n 




do* SB 1 . 


Suppose that the asymptotic expansion 


. /n Y 1 

ion of 1 - I ~ 
\2/ Cn 


is given by 


A+^+g+ 


On account of Proposition 3, we have that the asymptotic expansion in powers 
of 1/n of 


(9) 


rVw ^ ( , 2ai * . 4a2 *• i \ j ♦ 


must be equal to the asymptotic expansion of ^ . Since we may integrate 
in (9) term by term for suflEiciently large n, we easily obtain 


ote pe, «i 


ft 

2r’ 


a* =* 




2**r(r + 1) • . . (r + fc — 1) ‘ 


‘ See A. Qutemer, Theorie der Eindeuligen Analytitchen Funktionen, 1906, pp. 91-2. 
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The asymptotic expansion of 


nV 1 

, 2 / Cn 


can be calculated in the following manner: 


and 


fl _1_ ft ft _L 

(n + 2Y ^ n + 2 ^ in + 2)* 

v " *, + & + &+ 

n n* 



(1 + 2/ny n 

tt 


n — u + 1 
w — -u + li' + 1 * 


Equating the right hand members of these last two equations, and taking: 
logs, we obtain 

log [/So + + . . . ] = r log (1 + 2/») + E log (l - ) 

-Eiog(i-“““~ + + t + ••*)• 


Then we expand each term in a series of powers of 1/n and equate coefficients 
of 1 /n‘ for each i. We obtain the following formulae for the first five jS’s: 

/ 3 « = 1 

|Ji = r + i E (w — O’ — J 2 (« — «' — D’ 

u u 

ft = ft + ^ ^ ^ — 0’ — ^ E (w ~ “ 0* 

|8» = “♦ft ~ di ~ i/s! + ftft + 2ft + |r 

+ A 23 (« - 0* - A £(«-«'- 1)* 

% u 

p4 == 2fii + 2^1 + jSi + X + PiPi — fiipt — 4/8i 

4 

+ I + 3ft - + 1E (« - 0‘ - (u - - 1)‘. 


5. Practical use of the series. In practical applications, the value of the 
statistic, say Xo , is calculated, and it is desired that we determine whether or 
not this value of the statistic falls into the critical region. That is, for a partic- 
ular grouping of the variates, for a particular number of degrees of freedom, and 
for a chosen level of significance a, there is determined from the distribution of 
X, a value X* such that 


■= a, 


Prob [X < X*] 
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and if Xo < X* wo reject the hypothesis that in the population from which the 
sample is taken all the correlation coefficients between variates in different 
groups are zero. 

Since r is a monotonic decreasing function of X we make the test by computing 
Vo = — log Xo and we reject the hypothesis if vo > v* where v* — —log X*. But 
this is equivalent to computing Prob [e > ao] and if this value is less than a we 
reject the hypothesis. Now 

Prob [a > ao] = Jttiri , rj , • • • , r*) 

= f e ’(1 + aiv + o*a* + • • •) da. 
r(r) Jpo 


Setting 2 “ * 

P ”*’ 1 " > '•1 “ (I) ^ (s) *' + • • • ] *■ 


On account of Proposition 3 we obtain an asymptotic expansion of Prob [a > ao] 
by integrating the right hand member of the above equation term by term. 
This can be expressed by means of the incomplete gamma function, which is 
tabulated* in the form 


We obtain 


I(u, p) 


/ a” 6“* da 
Jo 

" r(p + 1 ) 


Pwb lv>v,] = (?) C.|[l - - l)^ 


+ 


ftfl _ t ( rM J- ^^*[1 _ r -U lV 

n ^ VVr +1 ’ / J n* ^ VZVr + 2 ' 7. 


+ 


The values of the (jonstant K 


- G)'- 


and the values of ft, ft, |8j, ft are 


herein tabulated for any grouping which might be made on six or fewer variates. 
Some cases, such as groupings (1, p — 1), in which case the distribution of X 
is the distribution of the multiple correlation coefficient; and as the groupings 
(2, p — 2), the exact distribution for which was given by Wilks as an incomplete 
Beta-function, arc supei-fluous her<'. These cases are included only for the sake 
of completeness. 


* K. Pearson (Editor), Tables of Ike Incomplete Gamma Function, Biometric Laboratory, 
London, 1022. 
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Tabu of the First Four /S’s 


Grouping 

r 


0% 

h 

04 

2,1 

1 

2 

4 

8 

16 

1,1,1 

1.5 

2.75 

6.28125 

13.38281 

27.57568 

3,1 

1.5 

3.75 

12.03125 

36.91406 

111.55225 

2,2 

2 

5 

19 

65 

211 

2,1,1 

2.5 

5.75 

23.53125 

83.97656 

279.50538 

1,1, 1,1 

3 

6.5 

28.625 

106.9375 

366.39844 

4,1 

2 

6 

28 

120 

496 

3,2 

3 

« 

55 

285 

1351 

3,1,1 

3.5 

9.75 

62.53125 

334.10156 

1615.91163 

2,2,1 

4 

11 

77 

439 

2229 

2, 1,1,1 

4.5 

11.75 

86.03125 

; 506.16406 

2628.23974 

1,1, 1,1,1 

5 

12.5 

95.625 

j 580.6875 

3085.52344 

5,1 

2.5 

8.75 

55.78125 

i 315.82031 

1690.65282 

4,2 

4 

14 

125 

910 

5901 

3,3 

4.5 

15.75 

154.03125 

1205.03906 

8277.55226 

4,1,1 

4.5 

14.75 

136.28125 

1015.50781 

6693.45068 

3,2,1 

5.5 

17.75 

189.53125 

1584.10156 

11445.75538 

2,2,2 

6 

19 

214 

1866 

13947 

3, 1,1,1 

6 

18.5 

203.625 

1740 ; 9375 

12797.27344 

2,2, 1,1 

6.5 

19.75 

229.03125 ^ 

2042.16406 

15530.08351 

2, 1,1, 1,1 

7 

20.5 

244.625 

2230.1875 

17257.64836 

1,1, 1,1, 1,1 

7.5 

21.25 

260.78125 

2430.49219 

19139.02892 
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Tables of the Constant K = C, 


n 

21 

111 

10 

.800 

.738 

11 

.818 

.761 

12 

.833 

.780 

13 

.846 

.796 

14 

.857 

.810 

15 

.867 

.822 

16 . 

.876 

.833 

17 

.882 

.843 

18 

.889 

.851 

19 

.895 

.859 

20 

.900 

.866 

22 

.909 

.878 

24 

.917 

.888 

26 

.923 

.896 

28 

.929 

.903 

30 

.933 

.910 

35 

.943 

.922 

40 

.950 

.932 

45 

.956 

.940 

50 

.960 

.946 

55 

.964 

.950 

60 

.967 

.964 

65 

.969 

.958 

70 

.971 

.961 

80 

.976 

.966 

90 

.978 

.970 

100 

.980 

.973 


81 

22 

211 

.646 

.560 

.517 

.676 

.595 

.553 

.702 

.625 

.585 

.724 

.651 

.612 

.743 

.674 

.637 

.769 

.693 

.658 

.774 

.711 

.677 

.787 

.727 

.694 

.798 

.741 

.709 

.808 

.764 

.723 

.818 

.765 

.736 

.834 

.786 

.758 

.847 

.802 

.777 

.859 

.817 

.793 

.869 

.829 

.807 

.877 

.840 

.819 

.894 

.862 

.843 

.908 

.879 

.862 

.918 

.892 

.877 

.926 

.902 

.889 

.932 

.911 

.899 

.938 

.918 

.907 

.943 

.924 

.914 

.947 

.930 

.920 

.953 

.938 

.930 

.969 

.945 

.937 

.963 

.951 

.943 


nil 

41 

311 

.477 

.480 

.310 

.515 

.521 

.362 

.548 

.556 

.390 

.576 

.586 

.424 

.602 

.612 

.455 

.624 

.636 

.482 

.646 

.656 

.508 

.663 

.675 

.531 

.679 

.691 

.552 

.694 

.706 

.571 

.708 

.720 

.589 

.732 

.744 

.620 

.752 

.764 

.647 

.770 

.781 

.671 

.785 

.796 

.691 

.798 

.809 

.710 

.825 

.835 

.747 

.846 

.856 

.776 

.862 

.871 

.799 

.875 

.883 

.818 

.886 

.894 

.833 

.895 

.902 

.846 

.903 

.910 

.858 

.910 

.916 

.867 

.921 

.926 

.883 

.929 

.934 

.896 

.936 

.941 

.906 
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Tables of the Constant K (it) 


n 

221 

2111 

32 

mil 

61 

42 

38 

10 

.269 

.248 

.336 

.229 

.323 

.168 

.136 

11 

.310 

.288 

.379 

.268 

.369 

.206 

.171 

12 

.347 

.325 

.417 

.304 

.410 

.243 

.205 

13 

.381 

.359 

.451 

.338 

.445 

.277 

.237 

14 

.412 

.390 

.481 

.368 

.478 

.309 

.268 

15 

.441 

.418 

.508 

.397 

.506 

.339 

.297 

16 

.467 

.444 

.533 

.423 

.532 

.367 

.324 

17 

.490 

.468 

.556 

.447 

.555 

.392 

.350 

18 

.512 

.490 

.576 

.470 

.576 

.416 

.374 

19 

.532 

.511 

.595 

.490 

.596 

.438 

.396 

20 

.551 

.530 

.612 

.510 

.613 

.459 

.417 

22 

.584 

.564 

.642 

.544 

.644 

.496 

.455 

24 

.613 

.593 

.668 

.575 

.671 

.529 

.489 

26 

.638 

.619 

.691 

.601 

.694 

.558 

.519 

28 

.660 

.642 

.711 

.625 

.714 

.584 

.546 

30 

.680 

.662 

.728 

.646 

.731 

.607 

.570 

35 

.720 

.704 

.764 

.689 

.767 

.654 

.621 

40 

.751 

.737 

.791 

.723 

.794 

.692 

.661 

45 

.776 

.763 

.813 

.751 

.816 

.722 

.694 

50 

.797 

.785 

.830 

.773 

.833 

.747 

.721 

55 

.814 

.803 

.845 

.792 

.848 

.768 

.743 

60 

.828 

.818 

.857 

.808 

.860 

.786 

.762 

65 

.841 

.831 

.868 

.822 

.870 

.801 

.779 

70 

.852 

.842 

.877 

.833 

.879 

.814 

.793 

80 

.869 

.861 

.892 

.853 

.894 

.836 

.817 

90 

.883 

.876 

.903 

.869 

.905 

.853 

.836 

100 

.894 

.888 

.913 

.881 

.915 

.867 

.852 
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Tables of the Constant K (iit) 


n 

411 

321 

222 

3111 

2211 

21111 

mill 

10 

.155 

.108 

.094 

.100 

.087 

.080 

.076 

11 

.192 

.140 

.123 

.130 

.114 

.106 

.099 

12 

.228 

.171 

.152 

.160 

.142 

.133 

.125 

13 

.261 

.201 

.180 

.189 

.170 

.160 

.150 

14 

.292 

.230 

.208 

.217 

.197 

.186 

.176 

15 

.322 

.257 

.235 

.244 

.223 

.212 

.201 

16 

.349 

.284 

.261 

.270 

.248 

.236 

.225 

17 

.375 

.309 

.285 

.295 

.272 

.260 

.248 

18 

.398 

.332 

.308 

.318 

.295 

.283 

.271 

19 

.421 

.354 

.330 

.340 

.317 

.304 

.292 

20 

.442 

.375 

.351 

.361 

.338 

.325 

.313 

22 

.479 

.414 

.390 

.400 

.376 

.363 

.351 

24 

.512 

.448 

.424 

.434 

.411 

.398 

.385 

26 

.542 

.479 

.456 

.465 

.442 

.430 

.417 

28 

.568 

.507 

.484 

.493 

.471 

.458 

.446 

30 

.591 

.532 

.510 

.519 

.497 

.484 

.472 

35 

.640 

.585 

.564 

.573 

.552 

.540 

.528 

40 

.679 

.628 

.608 

.616 

.597 

.585 

.574 

45 

.710 

.663 

.644 

.652 

.633 

.623 

.612 

50 

.736 

.692 

.674 

.681 

.664 

.654 

.644 

55 

.758 

.716 

.700 

.706 

.690 

.681 

.671 

60 

.776 

.737 

.722 

.728 

.712 

.704 

.695 

65 

.792 

.755 

.740 

.746 

.732 

.723 

.715 

70 

.805 

.771 

.757 

.762 

.749 

.741 

.733 

80 

.828 

.797 

.784 

.789 

.777 

.770 

.762 

90 

.846 

.818 

.806 

.811 

.800 

.793 

.786 

100 

.860 

.835 

.824 

.828 

.818 

.812 

.806 



THE MEAN SQUARE SUCCESSIVE DIFFERENCE 

/ 

By J. von Neumann/ R. H. Kent, H. R. Bellinson and B. I. Hart 
Aberdeen Proving Ground 


1. Introduction. In making measurements, every precaution is generally 
taken to hold the conditions of the experiment constant, in order that the 
population, whose parameters are to be estimated from the observations, shall 
remain fixed throughout the experiment. One wishes each observation to come 
from the same population, or what is the same thing if normality is assumed, 
from populations having the same means and standard deviations. 

There are cases, however, where the standard deviation may be held constant, 
but the mean varies from one observation to the next. If no correction is made 
for such variation of the mean, and the standard deviation is computed from 
the data in the conventional way, then the estimated standard deviation will 
tend to be larger than the true population value. When the variation in the 
mean Ls gradual, so that a trend (which need not be linear) is shifting the mean 
of the population, a rather simple method of minimizing the effect of the trend 
on dispersion is to estimate standard deviation from differences. It is for this 
purpose that the mean square successive difference 




£ {xn-i - Xif 


n — 1 


is suggested. The subscript i in this expression refers to the temporal order of 
the observation Xi . 

In using 5* for estimating standard deviation, the distribution of 5* in random 
samples is of interest, since questions of bias, efficiency, and confidence interval 
require consideration. 5^ may be used, in addition, to determine whether a 
trend actually exists; in this case one must know \vhether 6^ differs significantly 
from 




s (*< - if 


which measures variance independently of the order of the observations, and 
consequently includes the effect of the trend. 


' Institute for Advanced Study, Princeton, N. J. Also member of Scientific Advisory 
Committee of the Ballistic Research Laboratory, Aberdeen Proving Ground. 
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The distribution of 6^ is considered in this paper; it is hoped that others will 
shortly publish methods of estimating the probability that 6^ ^ ks^ as a> function 
of k and the sample size n. 


2. History. A somewhat similar procedure is suggested by '^Student'' [1] 
and E. S. Pearson [2] who consider the situation in which a shift may occur in 
the mean of the population, but where pairs of observations may be made with 
no shift in mean between them; standard deviation may be estimated from the 
differences between these pairs. The method can be generalized, and 


✓ n/S 

S 


{x2i — 


n 


is an estimate of the standard deviation, n must, of course, be an even integer. 
This estimate has the advantage that its properties are fully known: is dis- 
tributed as the standard deviation with/ = n/2 degrees of freedom. It will be 
noted that this estimate does not involve the successive differences, but only 
the alternate ones. Although there are n — 1 available successive difference:?, 
this estimate uses only the n/2 independent differences. The mean square 
successive difference is based on all n — 1 successive differences, and should 
therefore provide a more eflScient estimate of a than does s'. 

There is, of course, nothing new in the concept of estimating the standard 
deviation from differences. Even as far back as 1870, an interest in the method 
appears to have existed. Jordan [3] devised methods based on sums of powers 
of the differences. Helmert [4] gave more careful consideration to the case of 
the first power, i.e. the sum of the absolute differences. In both these cases, 
however, all the n(n — l)/2 differences that can be established from a sample of 
n observations were included in the estimate, so that the estimate was of no 
value in reducing the effect of a trend. Helmert realized this, for he pointed 
out that the estimate obtained from the sum of squares of the differences is 
exactly that obtained by the more conventional procedure of squaring deviations 
from the mean. 

The usefulness of the differences between successive observations only appears 
to have been realized first by ballisticians, who faced the problem of minimizing 
effects due to wind variation, heat and wear in measuring the dispersion of the 
distance traveled by shell. Vallier [6] appears to have been the first to estimate 
dispersion from successive differences. Cranz and Becker [6] commended the 
mean successive difference 


n-l 

\Xi+i- Xi 

n — 1 



To establish the precision of Ed in estimating a, Cranz and Becker quoted 
Helmert’s paper, and so erred in saying that their method was superior to that 
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of the mean deviation. Helmert’s procedure, baaed on n(n — l)/2 differences, 
is indeed more precise (for n > 10) than the mean deviation 

Zlxi-fl 

m.d. * , 

n 

but the mean successive difference is based on but n — 1 differences^ and so is 
not as precise. 

Bennett [7] appears to have suggested the use of successive differences inde* 
pendently of the European ballisticians. In recent years, the method of esti- 
mation by the mean square successive difference 5* was put into practice in the 
Ballistic Research Laboratory at the Aberdeen Proving Ground, U. S. Army, 
by L. S. Dederick. 


3. Bias and efficiency* The moments of 6^ in samples drawn from a normal 
population are derived in Section 6 of this paper. The moments are used at 
this point to establish the estimate of variance, and the efficiency of this estimate. 
The mean value of 5^ in samples taken at random from a normal population is 


(3) E{^) = 2ir\ 

^ coasequently offers an unbiased estimate of variance, and this estimate is 


(4) 


.2 2 (av+1 - XiY 

0 _ ,-1 

2 2(n - 1) 


The second moment, i.e., the variance, of 5* in samples of size n is 


( 6 ) 


I _ 4(3w — 4) _4 
■*’ (n^ 


As the sample size is increased, the distribution of 6* appears to approach 
the normal. It is therefore appropriate to consider the efficiency as defined by 
Fisher [8]. Accordingly, the efficiency of 6* is 


Since 



2 

<r,i 


2(n-l) 4 

= . 




n — 1 


2 


and 
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the efficiency of {* in estimating the standard deviation is 

2(n - 1) _ 2 r, . 1 1 

® -3^^ = 31+ srrij- 

The efficiency is unity for w = 2, since in this case the two statistics have 
the same distribution. It therefore appears that the efficiency decreases as the 
sample size increases, but approaches 2/3 as a limiting value for n very large. 

4. Stunmaiy of procedure. Having a statistic which estimates a parameter 
of a population, it is desirable to know the distribution of that statistic as com- 
puted from samples taken at random from that population. At present, the 
distribution of in samples of n has not been obtained. The difficulty is in the 
fact that the successive differences are not independent. The first difference, 
di ^ Xt — Xif and the second difference, dj = Xji — X 2 , are related in that the}' 
both involve X 2 . Similar correlation exists between every successive pair of 
differences between successive observations. 

For n = 2, and samples taken from a normal population, the distribution of 
6* is known. Since 

= (*2 - = 2 i (x* - x)* = 48 \ 

♦ -1 

the distribution of 8^ is similar to that of for this sample size. 

For n = 3, the distribution of 8^ has been derived analytically. The deriva- 
tion is indicated in Section 5 of this paper. For n > 3, only the moments of 
the distribution have thus far been obtained. A Pearson tyix^ distribution has 
been fitted to the first three moments to obtain an approximate representation 
of the true distribution. 

6. Distribution of 8^. In the case of a sample of n taken from a normal popula- 
tion, the probability that the first observation lies between Xi and Xi -}- dxi , 
while the second lies between X 2 and X 2 + dx 2 , etc., is 

If Vi = Xi^i — Xi , this expression becomes 

. . . dj/n-J, 

where Q is a quadratic form in Xi and the y^s. Since 
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the probability that ^ shall be less than some value d* is 

(9) ««■< Ji) fj ■■■ J - 

"s »;«>*-»)»• 

1-1 

After the integration with respect to Xi is carried out, the quadratic form in 
the exponent may be normalized by a transformation to new coordinates Z{ 
linearly related to the y’s. The z’s may be so chosen that all the terms z* in 
the exponent have the same coefficient, in which case 


( 10 ) 




dzidzf • 

9 2^11-1 j 


As a result of such a transformation, the sphere of integration in (9) becomes an 
ellipsoid in (10). By changing to polar coordinates, with 


( 11 ) 



1-1 


F(«‘ < fij) = c/l e-*"’r"-*dQdr, 


which 12 is the solid angle in the space of n — 1 dimensions. The limits of 
integration with respect to as a function of r must be found; this involves the 
evaluation of the solid angle subtended by the surface bounded by the inter- 
section of the (n “ 1 )-dimcnsional sphere and the (n — l)-dimen8ional ellipsoid. 
If 12 = 

(12) Pif < «?) = C2 £ e-*'V(r)r"~* dr, 


in which a is the longest semi-axis of the (n — 1 )-dimen8ional ellipsoid cor- 
responding to the given value of 5^. 

For n = 3, (9) becomes 

P(S’ < Jj) = // £ «xp [- 3 ^ (v! + 1 ^ + 9.W) 

»>+»«<*»» 

(13) “ i 3 ’'* ) J*'"*!'"'*** 

Normalizing the quadratic form in the exponent, 

(14) F(^ < * = // 
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— r*loo«* sin* 8] 129^ 


dddr 


and in polar coordinates 


(15) 


The integral in brackets can be shown to be a Bessel function of zero order; 
for let 

rVSff* = —2iu, 
r 


♦ = 2- - 2». 


then 

(16) 


= e-*“ £ = 2»-c"'Vo(tt). 

Consequently, (15) takes the form 

(17) W' < * - 

The probability density ftinction 

dF(«*) 


pis*) 


dS^ 


(18) 


= /o® 


vr 




\3«r*/ 


’V3‘ 




1 d* 


1 8* 


2* 3*<r« 2*4> 3V 


+ _J_ JI 4 . 1 

2*4*6* 3V‘* ”'J' 


6. Moments. The ^th moment of i* about the origin Is defined by 

(19) - F((a*)‘], 


or 

( 20 ) 


in - Dv; 


(lii + d) 




For any value of t, the expansion can be performed, and sunilar terms col- 
lected and enumerated. The values of x can be considered as true errors, i.e. 
as deviations from the true mean, without affecting the conclusions. If the 
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1S9 


original population from which the samples have been drawn is normal, with 
standard deviation a, then: 


(21) 




0 

(2fc)l 

2**:1 



and since, in the null case where the mean of the population remains constant, 
successive observations are independent, then 

E{xU]) ^ i^j 

E{x\x\) = E(x'')E{x*)f i 9^ j. 


These relations are sufficient for the evaluation of . For example, in the 
case of the second moment, < = 2: 

(23) (n — 1 )“m2 = 2 ^ ^ ^ . 


Now: 


1^2 ^ X? - (xl + Xn) - 2 2 j 

~ ^ “I” ^ Xi^iX^ 

— 4(xJ + Xn) ~ SX)®* ^Xi^lXi + 4(xJ + xl) ^Xi^iXi 

♦-1 


= 4 r 2 ^< + iC + 1^1 + 2xixl + xt] 

L<-i J 

+ 4 1^2 a:<+i**j - 4 + xj £ xj + xl 2 *? + J 

+ [terms containing odd powers of Xi], 


The mean of these terms is found by using (21) and (22), and the number of 
each type of term present is enumerated: 

4[w(3<7^) + n{n — + [3<r^ + 2<rV + 3<r^] + 4[(w — l)<rV] 

- 4[Sff* + An - l)v“ + An - l)v* + 3a*] = (4n* + 4n - 12)v^ 


Consequently 

(24) 


I 4(n* + » — 3) 4 
'** (n-D* 


The first four moments about the origin were evaluated by this procedure, 
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and from these, the moments about the mean are readily determined. The 
results are: 

/ 4(n* + n — 3) 4 

/ 8(n* + 6n* + 2n - 21) . 

a 

r 16(n‘ + 14n’ + 53n* - 8n - 231) » 

(25) ■ " ‘(n-l)^ ■ ^ 

Ml = 0 

4(3n — 4) 4 

-i). - 

32(5n - 8) , 

a 

48(9n* + 46n - 112) , 

(n-iy 

It should be noted at this point that the above fourth moment is incorrect 
for n = 2. One of the terms in the expansion of the right side of (20), for 
< = 4, is 

n— 1 

S t T' 2 2 

Xl** 2-X*+iX,-. 

•-1 

For n = 2, the mean value of this term is 

ECxixlxlxi) = Mxi)E(xi) = 9a*, 
whereas for n > 2, the mean value is 

E(xixtxt) + E^ix\ 2 x<+ix<^ + ^(xiX*_ixt) = (n + 3)a*. 


7. Pearson type fit to distribution of 5*. From the moments it is foimd that 

« _ iL* = 16(5n - 8)* 

^ ^ nl (3n - 4)* ’ 

(26) 

M4 _ 3(9n* + 46n - 112) 

* (3n-4)* ■’ 


As n becomes large, and ft approach 0 and 3 respectively; the distribution 
therefore appears to approach the normal for large samples. For finite sample 
sizes, the values of and ft correspond to those of the Pearson Type VI 
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diHtribution, 

The origin of this distribution is at 6* =» — but the origin of the true dis- 
tribution must be at 6* *= 0. By taking ai = 0 so that the origin is at ~ 0, 
we obtain what appears to be a suitable approximation 


(27) 




The parameters are determined by equating the 1st, 2nd and 3rd moments of 
(27) to the corresponding moments of the true distribution, with the result that 

^ 371^ - lOn* - 18n* + 79n - 60 
'■ 8n* - 60n + 48 


9i 

(28) 


4 — + l)(9i + 3) 

4-m*(9*+1) 

2(gt - g» - 2) 

"9*+l ~ ’ 

_ 

B(g» + 1, — ?» — 1) 


Values of these parametei’s for selected values of n are given in Table I. The 
sixth and seventh columns of this table give the values of jSt for the distribution 
(27) and for the true distribution, respectively. 


TABLE 1 


(1) 

(2) 

(3) 

(4) 

(6) 

(6) 

(7) 

(8) 






$t 

Pt 

Ratio 

n 

• 9i 

9* 

ai 

c 

(27) 

True 

{6)/(7) 

5 

24.4391 

0.6391 

26.6000 

5.8800 X 10** 

8.807 

8.504 

1.036 

7 

31.1286 

1.3857 

23.2571 

4.9285 X 10** 

6.948 

6.758 

1.028 

10 

41.2830 

2.5079 

20.9667 

9.4934 X 10** 

5.658 

5.538 

1.022 

15 

58.2113 

4.3806 

19.2659 

4.0240 X 10”- 

4.718 

4.645 

1.016 

20 

75.1210 

6.^3 

18.4351 

1.8063 X 10*« 

4.269 

4.217 

1.012 

25 

92.0189 

8.1285 

17.9417 

8.1097 X 10“* 

4.006 

3.965 

1.010 

50 

176.4443 

17.5018 

16.9651 

1.3386 X 10**" 

3.494 

3.475 

1.005 


The Tables of the Incomplete Beta-Function [9] can be used to evaluate the 
probability integral of the distribution (27), 




(29) 


a= 1 — /*(gi — 92 1, ^2 + 1) 


(h 

U2 + 5*/ ^ 


X =5 
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for n g 14. For n > 14, the probability integral may be determined by quad- 
rature. Some values of the probability integral forn *= 60 are given in Table II. 
A comparison with the integral of the normal curve having the same first two 
moments indicates that a sample of somewhat more than 50 is required before 
the normal curve becomes a satisfactory approximation to the distribution (27). 

TABLE II 




.*/ • 

(29) 

Normal 

.50 

•00000 

•00118 

.75 

.00031 

•00563 

1.00 

.00647 

.02129 

1.25 

.04393 

.06418 
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THE RETURN PERIOD OF FLOOD FLOWS 
Bt E. J. Gttmbxl 
New School for Social Research 

Introduction. Engineers have used various interpolation formulas to repre- 
sent the observed distribution of flood discharges. These formulas are some- 
times constructed ad hoc for a given stream, and have no general meaning. Most 
of them are rather complicated.^ Some authors have tried to introduce upper 
and lower limits to the discharges, even though it is doubtful that such limits 
exist. Others have introduced the third and fourth moments of the distribution, 
in spite of the fact that these numerical values are subject to large errors. For 
some formulas it is impossible to give a meaning to the constants; different form- 
ulas applied to the same stream give rather contradictory results; and conse- 
quently there is considerable confusion. For example, Slade [20] has stated that 
‘‘the statistical method in whatever form employed is an entirely inadequate 
tool in the determination of flood frequencies.^^ According to Saville [19] “the 
engineer should satisfy himself that he has used an adequate number of methods, 
whether matheinaticial, graphic or otherwise, which have real support from either 
theory or experience, and then form his own judgement.^ 

The main reason for tliis situation is that these studies have little or no 
theoretical basis. The author believes it possible to give exact solutions, 
exactitude being interpreted from the standpoint of the calculus of probabilities 
[10]. Our solutions are simply the consequences of a truism: “The flood dis- 
charges are the largest values of the discharges.'^ The present study is but an 
explanation of this statement. 

Many American authors start with a statistical function, which we call the 
return period of floods. Therefore we shall first analyse the notion of return 
period and show how it can be derived as a consequence of the concept of dis- 
tribution. We then give a short r&um^ of the theory of largest values. The 
discharge, and in consequence the flood discharge, is considered as an unlimited 
statistical variable; it is not necessary to determine its distribution. We are 
justified in representing the observed distribution of flows by one of the the- 
oretical distributions of largest values. The distribution we choose contains 
only two constants, and both have a clear hydrological meaning. The numeri- 
cal values are calculated by the method of moments. 

^ In recent years many articles discussing this topic have been published by the Amerioan 
Society of Civil Engineers and the American Geophysical Union [8]. A review of some of 
the proposed formulas is given in the Water Supply Paper 771 [17]. 
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The application of the notion of return period to the largest values leads to a 
simple formula for the return period of the floods. In the last part of this paper 
we represent the flood flows of the Rh6ne and Mississippi Rivers by our formula. 

1. The return period. Let us consider a continuous statistical variable 
having a theoretical distribution w{x). The probability W(x) of a value less 
than or equal to Xy and the probability P(x) of a value greater than or equal to 
Xy are 

(1) Wix) = w{z)dzy P(x) = J w(z)dzy 

where z denotes the variable of integration. Clearly 

(1') W{x) + P{x) = 1. 

I iCt n be the number of observ^ations. Let (w = 1 , 2, • • • , n) be the 
observed values arranged in increasing magnitude, where m is the serial number 
beginning with the lowest (*^from below'^). The lowest observation has the 
serial number m = 1, the highest has the serial number m = w. These observed 
values will be written Xi , and Xn respectively. The number of observations 
below or equal to arm is m = n*W (Xm) where 'IT (arm) is the observed relative 
numlxjr corresponding to the probability W(x). The graphic representation of 
this series is called a cumulative^ histogram. 

In hydraulics many authors arrange the observations in decreasing magnitude. 
Let mX (m = 1, 2, • • • , n) be these observed values. The serial number m is 
counted in a descending scale (“from above^^. For the largest value m = 1, 
for the lowest value m = n. The number of observations above or equal to 
mX is m = n'Pintx) where 'P(mX) corresponds to P{x). The numbers 'Wixm) 
will never decrease; the number 'P(mX) will never increase. The mth value on 
a descending scale is the n — m + 1th value on an ascending scale. Therefore 

(2) n'P(mx) = n — n'TF(xm) + 1, 
and 

(2') nP{x) = n — nW{x), 

The difference between formulas (2) and (2') will play a certain r6Ie later. 

Different methods are used in statistics in comparing the theoretical values 
W{x) or P{x) and w(x) with the corresponding observations 'W(xm)f or 'P(mX) 
(cumulative frequencies) and AW(xm) (frequenc}^ distribution). They all have 
in common an arrangement of observed values according to magnitude. 

For the purpose of considering the observations in chronological order, we 
introduce a statistical criterion which at first glance may appear to have a new 
logical structure. It is assumed here that the observations are made at constant 
time intervals, and this interval is considered the unit of time. We suppose 
that the observations are homogeneous, i.e., subject to a common set of forces. 
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Furthermore, we suppose that the events are independent of one another : the 
occurrence of a high or low value for x has no influence on the value of any 
succeeding observation. Let us choose a low value x, and ask the following: 
After what number of observations does this or a greater value return? We 
calculate the mean of these chronological intervals between every two consecu- 
tive values, equal to or greater than x. We repeat these operations for a second, 
third, . . . till the penultimate value of x. 

These means are called the observed return periods. The criterion consists of 
the comparison of the observed, and the theoretical return period for increasing 
values of x. For a discontinuous variable we could obtain the return period for 
a value equal to x, (not equal to or greater than x). This average time, which 
is sometimes used in physics, does not interest us, as our variable, the discharge, 
is continuous. We limit our consideration to the return period of a value equal 
to or greater than x, called: value greater than x. 

The determination of the theoretical return period is a classical problem; 
How many trials must, on the average, be made, in order that an event of a 
given probability should happen? Our event, the realization of a value, equal 
to or greater than a-, has the probability P{x) = 1 — W{x), 

The mean number of trials T{x) which are necessary to obtain our event once, 
is evidently 


(3) T(x) 

or 


(30 


nx) = 


pXxY 


ThiK value T{jx) is the mean chronological interval between two values, equal 
to or greater than x. If we start at the time when such a value has been ob - 
served for the first time, we can interpret T{x) as the theoretical return period 
of a value equal to or greater than x. We designate it as the theoretical return 
period. This concept has not been used in statistics. It is a well-known con- 
cept in hydraulics which was introduced by Fuller [6]. To every theoretical 
distribution w{x) there is a corresponding return period T(x) and conversely, 
to every theoretical return period T{x) there is a corresponding distribution 


(4) 


w{x) — 


r{x) 

T*(x)’ 


obtained by differentiating (3). 

If the variable is without limit to the left, the return period will start with 
T — \. If the variable is limited to the left by x ^ « the corresponding return 
period will be 


(5) 


m ^ 1 


if W{t) ^ 0 
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In the graphic representation, the return period T{x) which has a time dimen- 
sion, will be the abcissa and x the ordinate. Therefore we consider x as a func- 
tion of T{x)] from (4) we obtain 


dx 


w(x)T{x) 


where In signifies the natural logarithm. The increase of x as a function of 
In T{x) will be very rapid for small values of 7\ For a limited distribution 
the same result is obtained, provided the probability W{t) and the density of 
probability w{t) are sufficiently small. Clearly, the? return periods of the three 
quartiles are respectively IJ, 2, 4. The return period will always increase 
with X. It will tend towards infinity even if the variable is limited to the right. 

Let us now consider the calculus of the observed return periods. Instead of 
values equal to or greater than Xm we will only s]:)eak of values greater than Xm . 
The observed return period is the interval between the first and the last observa- 
tion greater than x«, , divided by the number of interA^als between all observa- 
tions greater than x^ . The number of observations greater than x» is n — 
n'TF(x»»). Between these observations there are n — n'Tr(xm) ~ 1 intervals. 
This denominator is independent of the chronological order of the observed 
values. We can calculate the mean of the observed intervals up to a value Xm 
so that n — n^W (Xm) = 2. For this value of x^ there arc only two observa- 
tions, i.e., only one interval. In that case no mean can be calculated. 

The numerator, the interval between the first and the last observation greater 
than Xm will be n — 1, provided that the first and the last value in chronological 
order are greater than Xm . But in general the first value greater than Xm will 
be the ('fc + l)th in chronological order. The first value greater than Xm found 
in the reverse chronological order, will be the (A' + l)th. Let '/? -f jk' = Z, then 
the interval between the last and the first value greater than Xm is n — 1 — Z. 
The mean observed interval is thus 

i7Xxm) = (n - 1 - Z)/(n - 1 - n'W(Xm)), 


(7) ,nx.).(.- 

This magnitude depends only on the chronological order of the first and the 
last value greater than Xm . It is independent of the chronological order of all 
other observations. Even in the case Z = 0 this value differs from the theoretical 
value (3). The observed value surpasses the theoretical value, even if the 
frequency 'IF(xm) is identical with the probability W(x). 

In the general case, Z > 0, this difference is a function of Z. The number Z 
depends upon the times at which the observations begin and cease; but it is 
not a characteristic of the chronological order. As a result of these disad- 
vantages of formula (7) we prefer to introduce other definitions, in which the 
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chronological order does not enter. These definitions have an added advantage 
in that they are constructed in a manner analogous to the theoretical formula. 
The observed value which corresponds to (3) is 


(8) 


or 


(9) 

'T(x^) = n/(n - m). 


But this definition of the observed return period is not the only one which 
corresponds to (3). Starting with the serial number m, in a descending scale, 
Fuller [6] puts 

(80 'T(x«) = --. 

m 

According to this definition, the return period of the mth value from below is 
(90 - n/(n ~ m + 1). 

TABLE I 

Two definitions of the observed return ^periods 


observed 

serial number 

serial number 

exceedance interval 

recurrence interval 

variable 

from below 

from above 

formula (9) 

formula (90 

Xi 

1 

n 

n/(n — 1) 

1 

Xi 

2 

n — 1 

n/(n — 2) 

n/(n — 1) 

Xm 

m 

n — m + 1 

n/(n — m) 

n/(n — m 4* 1) 

Xn-l 

n — 1 

2 

n/1 

n/2 

Xfi 

n 

1 

— 

njX 


This observed return period corresponds to the theoretical return period (30* 
The difference between (9) and (90 results from the fact that the relation (2) 
between the observed cumulative frequencies 'Tr(Xm) and 'P(mx) differs from the 
relation (20 between the probabilities W{x) and T{x). The two definitions 
of the observed return periods are related by 

(10) = T(x.) < T(x^+i). 

From a purely logical standpoint the 'first definition is as justifiable as the 
second one. Both are used in hydraulics. In order to avoid confusion between 
formulas (9) and (90 Horton [16] calls 'T(xm) the exceedance interval, i.e., ‘‘the 
average interval at which an event of given magnitude is exceeded,^' whereas 
he defines "T(Xm)f the recurrence interval as “the average interval of occurrence 
of values equalling or exceeding a given magnitude.” Of course, the exceedance 
interval surpasses the recurrence interval. Since both observed intervals cor- 
respond to a common theoretical return period we designate both of them as 
observed return periods. 

The difference between formulas (9) and (90 is made clear in Table I. 
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Each of the definitions (9) and (9') and the theoretical expression T(x) has 
different properties. For the lowest observation 

n'W(xi) = 1; n'PU) = n. 

Therefore 


'Tixi) = 1 + -J- ; "T{x,) = 1, 

n — 1 

whereas for an unlimited distribution lim T(x) = 1. 

X -*— 00 

If the number of observations is sufficiently large the numerical differences 
between the two observed periods are rather small, except for very large values 
of the variable. For the last observation 

nfW{Xn) = w; n'PGx) = 1. 

Therefore the return period 'T{xn) for the last observation does not exist. Ac- 
cording to the second definition the n^turn period for the last value is equal to 
the total number of observations. But in general there is only one observation 
of the last value. 

The preference given formula (9) over (9') corresponds with the preference 
given to W{x) over P{x) when comparing the theoretical with the observed 
values. Therefore it is natural to count m from below. Since both definitions 
are equally applicable and since they lead to different results for large values of 
the variable, one should not calculate the return period for a small number of 
observations. 

The observed return periods (9) and (9') differ from the theoretical return 
period (3) in the same way that the frequencies 'W{xm) or 'Pimx) differ from the 
probabilities W(x) or P{x), The chronological order enters neither into formula 
(7) nor into (9) or (9'). We need not take it into consideration, since the 
theoretical return period is obtained from the probability and the observed 
return period from the cumulative histogram. Therefore the usual statistical 
methods can be used for making the comparison between observed and theoreti- 
cal return periods. 

The return period is a statistical function like the distribution w(x) or the 
probability W(x), No formula for T(x) that contradicts the properties of w{x) 
can be accepted. The return period T(x) will contain the same number of inde- 
jHjndent constants as the distribution w{x). Consequently the fit of the theo- 
retical curve T(x) to the observations 'T(xm) or ”T{xm) cannot be improved by 
introducing a new constant without also changing the distribution w{x). The 
theoretical curve x — f(T) will fit the observed curves {Xm , 'T(Xm)) and 
(Xm j ”T{xm)) in a way that depends upon the fit of W(x) and P(x) to 'W(Xfn) 
and TUx). 

Let us suppose that w(x) contains k constants ; that they are determined by the 
method of moments which conserves the arithmetic mean x, the mean of the 
squares x^ etc. of the observed distribution. For the return period these mo- 
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ments have a meaning. Let us consider for the sake of simplicity a positive 
variable. The Mh moment Af* 


Mk 


J x^dW(z) 


= - - Wix)) 

Jq 

= fc (1 - F(x))x*"‘d® 

is according to (3) 

r* dx 

( 11 ) Mk= kj^ , 

whence for k = 1 and k = 2 


(110 



E(x^) = 2 r 

Jo 


xdx 

T(xy 


For a given distribution containing two constants, the method of moments con- 
serves the area and the* center of gravity of the reciprocal of the return period. 
Even if the method of methods gives the best determination of the constants, 
for the distribution, it ne(‘d not give the best determination for the return 
period. But if th(» observed return periods wen* used for the determination of 
the constants >ve would get two sets, since there are two observed curves having 
equal validity, but different values for large x. We will get one and only one 
set if the constants are calculated from the observed distribution, for here the 
difference between '7’(^m) and "T{x^) does not matter. The fact that we do 
not take the constants from the observed return periods, but from another 
statistical function, might be a cause for deviations between the observed and 
the theoretical return periods. 

Once the constants have been found, we compare the observed curves 
(Xm , 'T(Xw)) and (x„, , "T{xn,)) with the theoretical curve x = /(T). To avoid 
discontinuity the observed return period will be established for all values of Xm 
arranged in increasing order. 

If the observed return periods for small values of x are systematically smaller 
(greater) than the theoretical period, it Is reasonable to conclude that there 
exists an attraction (repulsion) for small values of the variable and a repulsion 
(attraction) for the large values. But it must be remembered that the observed 
values have different weights in that the return periods for small values of x are 
based on many observations. This number diminishes as x increases. The last 
observed return period is based only on two observations. Therefore the di- 
vergence between theory and observation will increase with the variable. With 
this precaution the criterion of the return period suggests one cause of difference 
between theory and observation. In order to apply this method to the largest 
values we must first establish the corresponding distribution. 
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2. Theory of the l«i|;e8t value. Let x be a statistical variable unliinited to 
the right having the distribution w(x). Among the N observed values, one will 
be larger than the others. We wish to determine its theoretical value. 

According to the principle of multiplication the probability 9Sjir(x) that N 
values are inferior to x is 

( 12 ) mix) = W^ix), 

This is the probability of x being the largest value. The largest value is a new 
statistical variable which possesses a mode, a mean tZ, a standard deviation 8 
and higher moments. To get the mean the distribution to^ix) of the largest 
value is needed. From (12) by differentiation 

(13) toAx) = NW’'-\x)w(x). 

The mode will be the solution of 


(130 


N- 1 
W{x) 


w(x) + 


w'(x) 

w(x) 


= 0. 


For a given initial distribution w(x) and for small N we have to solve this equa- 
tion. But the mean and the moments cannot be obtained in a general way by 
the use of the exact distribution (13). However we can reach general solutions 
if JV is large, provided we limit ourselves to certain classes of initial distributions. 
We have studied this problem in previous publications [1 1 13]. For our present 
purpose it is suflScient to give the results in a form due to R. von Mises [18]. 

We define a large value u of the variable x by 


(14) N(1 - W{u)) = 1. 

This means that the expected number of observations equal to or greater than u 
is one. Equation (14) is but another form of definition (3). The mean number 
of trials is used in (3) whereas the original variable x is used in (14). 

The probability a du that a value greater than u will he contained between u 
and u + du\B given by 


(16) 


_ w{u) 

" r^Wi^y 


Obviously a and u are functions of N and the constants in the initial distri- 
bution w{x). There are two limiting forms of the probability (12) 


lim W^{x) = F{x); lim W”{x) « ©(x). 


If 


(16) 

lim aw « jfc > 0, 

we obtain 


(17) 

^(x) = 
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This probability function was first established by Fr4chet {5]. If 


d 

lim - 3 - 

U"*m dU 




we obtain 


©(*) = e 




This probability function is due to R. A. Fisher [4]. Let us consider the first 
limit. The initial distributions which lead to it belong to the Pareto type. 
For this distribution 

and condition (16) holds; for any value of x 

xw(x) _ . 

1 - WXx) * 

The distribution f(x) of the largest value, which corresponds to (17), is 


"iOT 




The mode Sn of the largest value is the solution of 


hence 


(ifc + 1 ) In 


fc ^ 1 




According to the definition (14) the mode of the largest value will increase 
with N, For a finite number of observations, which is always the case, the 
mode will be limited. But the moments of order k or higher will not exist. 
For fc < 1 , no moment will exist. For k < 2, only the first moment, the mean, 
exists, and so on. 

Let us consider now the second limit (19). The initial distributions which 
lead to it belong to the exponential type. For this distribution [14] 


w{x) = 

and for any value of x 


x^O, 


- TF(g) ^ 
to(*) > 


0 , 
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which means that condition (18) is fulfilled. Most of the distributions used in 
statistics belong to this type. According to (19) the distribution of the largest 
value is 

(22) to(a:) = 

If we introduce a reduced variable y without dimension by the linear trans- 
formation 


(23) y ^ a(x - u), 

we get the reduced probability SB(y) 

«( 2 /) = mx) 

—•“V 

= e 


(24) 


The numerical values of this function, calculated by means of Becker's tables [1], 
are given in Table II, col. 1 and 2. The reduced distribution 

(25) t>iy) = 


makes clear the meaning of u: the distribution has one and only one maximum 
which occurs for the reduced value y = 0. Therefore u is the mode of the 
largest value for a given set of N observations. For an initial distribution w{x) 
satisfying (18), and for large iV, definition (3) of the return period as a function 
of X becomes identical with relation (14) which involves the number of observa- 
tions N and the corresponding most probable value u. 

We wish to decide which distribution of the largest value is to be used to 
represent the given observations. This decision depends, according to (16) and 
(18), on the nature of the initial distribution at the extreme values of the 
variable. If the law of the observed initial variable is known, a precise answer 
can be given. But generally sjxjaking, a distribution chosen to represent given 
observations is nothing but an interpolation formula. Formulas having different 
analytical properties may all give satisfactory results. One might fulfill condi- 
tion (16), and another (18). The conditions apply to the differential coefficient, 
whereas the initial observations are always discontinuous. Therefore they will 
not enable us to decide which, if any, of the conditions is met. For extreme 
valuas of the variable x the observed differences are large and nonuniform, and 
there is therefore no way to replace the differentiation by a finite difference. 
Consequently we have to use the observations of the largest values to control 
the two competing theories and not the conditions. The fact that distribution 
(20) has higher moments only under certain conditions, is a strong practical 
argument in favor of distribution (22). Therefore the following development 
will be based on this distribution. 

It can be shown that the mean error 6 of distribution (22) is related to the 
constant a by 

(26) e = 0.98/q:. 

Therefore the constant u is the most probable larjgest value for N observations 
and 1/a a multiple of the mean error. 
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reduced 

variable 

V 


TABLE II 

Probabilities and return periods of largest values 

Flood discharges per second 


probability 

mx) 


return period 
log T(x) 


in cubic meter in 1000 cubic feet 


z 

Rh6ne R. 


Mississippi R. 
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TABLE III 


Observed return periods 
Rhdne, Lyon (France) (1826-1936) 
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TABLE m— Concluded 
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The substitution of the numerical values leads to 


(30') = 0.779702/ - 0.45005. 

Conversely, 

(31) y = 1.28255« + 0.57722. 


The value (32) v == s/tZ, the coeflScient of variation, is related to the product 
ofW. By (27) au ^ au -- c and by (28) 


(33) 


au = 


V 


c. 


Therefore the numerical value of au can also be considered as a characteristic 
of an observed distribution of largest values. 

For the two constants we calculate for the observed distribution of largest 
values the two first moments 


(34) 

and 


n^Ti 


(35) u^ = -t,xl. 

n 

To get the observed standard deviation we use the Gaussian formula 

(36) « = (i? - if). 

According to (28) and (27) 

(37) - = 0.7796968s, 

a 

and 


(38) 


. 0.5772157 

u ^ u — 


a 

These formulas give the two constants in the distribution of largest values. 


3. Flood flows inteipreted as largest values. We will now apply the theory 
of largest values to flood flows. Let us consider the daily flow as a statistical 
variable, unlimited to the right. This idea is not new. The formulas proposed 
by Fuller [7], Hazen [15], and numerous other authors all incorporate this 
assumption. Gibrat [9] supposes that the daily flows vary according to Galton's 
distribution. Instead of postulating a specific formula for the distribution of 
flows we shall only suppose that it belongs to the usual exponential type, which 
means that condition (18) is fulfilled. 

We define a flood as being the largest value of the iST = 365 daily flows. The 
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flood flows are therefore the largest values of flows. This commonplace implies 
the distinction between floods and inundations. For each year there exists one 
or more floods of the same magnitude, but there might exist several different 
inundations or none at all. If there are several inundations in a year the 
greatest one will be a flood; but a flood need not to be an inundation: even a 
dry year has a flood. We limit ourselves to floods, assume that N — 366 is a 
large number, and represent the distribution of annual floods by the distribution 
(22) of largest values. 

There have been objections to the concept that the daily flow is an unlimited 
variable. Horton [16] believes that this implies the absurd idea of unlimited 
floods. This opinion is shared by Slade [20], who claims that there is a definite 
upper limit to the magnitude of the floods for a given stream. The theory of 
largest values confirms only partially Horton\s opinion. If we should choose 
distribution (20), the most probable annual flood will be limited. For this 
distribution, however, it might happen that the mean annual flood has no 
meaning. To avoid this we have chosen distribution (22), for which the mean 
annual flood and all the moments will be finite. A further justification of the 
use of (22) might be derived from the fact that Galton^s distribution belongs to 
the exponential type. As a final argument, numerical calculations show that 
formula (22) gives a better fit to the observed distributions of flows. 

The variable x is the annual flood flow measured in cubic meters or cubic 
feet per second. The moan u is the annual mean flood, whereas u is the most 
probable annual flood. The value s is the standard deviation of the distribu- 
tion of annual floods. Finally y is called the reduced flood. 

The distribution (22) possesses the properties of the observed distribution of 
flood flows. It is asymmetrical; rising rather quickly but falling rather slowly. 
The modal value is to the left of the mean (see Fig. 3). 

To apply the theory of return periods let us consider the event of the highest 
annual discharge being greater than x. We have to replace in formula (3) the 
general probability W{x) by the probability of flood discharges (19). The 
number of observations n is the number of yeart for which observations exist. 

To use formula (3) we have to suppose that the intervals between the suc- 
cessive floods are all equal to one year. This assumption conforms more or less 
to the seasonal nature of floods. 

The return period of a flood greater than x 

(39) Tix) = , 

is the arithmetic mean of the intervals between two years, which have a flood 
discharge greater than x; the discharges for the intervening years are all less 
than X. Therefore T(x) is the mean of the number of years for which x will be 
surpassed once. Formula (39) gives the meaning of u from the standpoint of 
the return period. For y = 0 

€ 

6 — 1 * 


r(u)- 
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The return period T{u) of the most probable annual flood is 1.58198 years. In 
other words, the constant u is the flood discharge with return period 

(40) log T{u) = 0.19920 

where log signifies the common logarithm. The return period of the mean 
annual flood is by (27) and (39) equal to 2.32762 years. 

Let us now consider the relation between the flood discharge x and its return 
period for small and large values of x. To small values of x correspond large 
negative values of y and therefore return periods T approximating 1. The 
distribution (25) of the largest values being unlimited, the flood discharge con- 
sidered as a function of log T will by (6) increase rapidly at first. To large 
values of x correspond large values of y and T{x), If we introduce the natural 
logarithm, (39) gives 

-In (l - = c 

For large values of x, viz., T{x) ^ 10, it is sufficiently accurate to use 


ni) 

so that 

(41) y = In r(x). 

If the common logarithm is used, 

(42) log T{x) = 0.434294a(x — u). 

The logarithm of the mean number of years for which the flood discharge will 
once be exceeded, converges towards a linear function of x. This property of 
the distribution of largest values was established by M. Coutagne [2]. Let us 
write 

/iio\ I 2.30258 1 V 

(43) a; » H log r(x). 

a 


Then 1/a can be considered as a measure of the increase of a flood discharge 
with respect to the logarithm of time. 

According to the general formulas (6) and (42) the shape of the return period 
as a function of the flood discharge x is as follows: at the beginning i.e., for small 
flood discharge, the return periods are close to 1 and increase very slowly. At 
the end, i.e., for large flood discharges, the logarithm of the return period con- 
verges to a linear function of x. 

Another form of (43) is 


(44) 


X - , 2.30268 , V 

- = 1 ^ log r(x). 

u au 


The ratio of the flood discharge which will be exceeded in the mean once in T 
years to the modal annual flood converges to a linear function of the logarithm 
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of the return period. The constant l/ou of dimension zero depends, by (33), 
on the coefficient of variation. Its value is a characteristic of the stream. If 
we introduce the arithmetic mean U and the standard deviation a we obtain 
by (42), (27), and (28) 

X = a - 0.46006* + (0.77970) (2.30268)« log T{x). 

Therefore, approximately, 

(45) £ = 1 - ® r + i.796t> log T{x). 

u zu 

The right hand member of this linear equation contains oitly one constant, the 
coefficient of variation of the floods. Finally by (42) and (31) 


(46) log T(x) = 0.26068 + 0.56700 ? 

8 


There is still another way of interpreting these asymptotic formulas. 
T(2x) be the return period of the value 2x, then by (43) 


Let 


2x = w + 


In T(2x) 


therefore 


au + In T(2x) 
cm + In T(x) * 


and finally 

(47) T(2x) = T\x)e^\ 


The return period of a flood of magnitude 2x is equal to the square of the 
return period of x multiplied by a factor which depends only upon the coefficient 
of variation. 

All these asymptotic formulas are good appro^^ations only for return periods 
above ten years, which means according to Table II, y ^ 2.26 or according 
to (23), (30) and (31) x ^ iZ + 1.3». The corresponding value of the flood 
probability is by (3) SB(a;) ^ 0.9. The consequences of (41) can be applied to 
only 10% of the observations, i.e. to the large flood discharges. Their observed 
return periods are based on a few observations and may therefore differ con- 
siderably from the theoretical values. In spite of the above restrictions the 
linear formula (43) has a meaning for values of T equal to or greater than unity. 
We now ask: How will the most probable largest value increase with the number 
of observations? This number of years can again be called T. The answer to 
the above question requires the solution of (13') where the distribution (26) of 
largest values t)(y) must be introduced as the initial distribution w{x). 

From (24) 


~ 1+ e-*' « 0, 


T- 1 
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or 

Te^ = 1 , 


which is identical with (41). For T == 1 the most probable annual flood is of 
course u. Therefore the relation (41), valid for T S 1, means: The most prob- 
able flood u{T) to be reached within T years is a linear function of the logarithm 
of T 


(410 


u(T) . u + 

a 


The constant 1/a Is the slopt* of this straight line. The results (41 “46) are 
related to Fuller's well-knowm formula [6]. This author, the first to investigate 
flood flows systematically, proposed a linear relation between the logarithm of 
the return period and the arithmetic mean of the flood discharges greater than 
the mth value (m taken from above). A similar empirical formula has been 
^stated by Lane [7] and has been applied by Saville [19]. The similarities and 
differences between these interpolation formulas and our theory can be stated 
in the following way: If we start from the theory of largest values we reach 
these formulas as asymptotic expressions for the return period of large floods. 
Considered this way, our theory gives a certain justification to Fuller's hypothe- 
sis. But Fuller's and similai* formulas were intended to apply to all flood 
discharges. Now, the distribution of the flood discharges (4) corresponding to 
these return periods does not fit the observations. It can be shown that these 
formulas involve the assumption of a simple exponential distribution ^(x) for 
the flood discharges 

(48) V’Ca:) = . — 

U — € 

and the existence of a lower limit € of the flood discharges given by c = iZ — s. 
In Fuller's formula all flood discharges must be greater than 2/3 of the mean 
annual flood. The density of probability always diminishes with increasing 
magnitude of the flood. This neglects the ascending branch (about one third) 
of the distribution of floods (see Fig. 3) and is incompatible with the observed 
facts. We therefore prefer our formula which takes account of the total varia- 
tion, but we do not minimize the importance of Fuller's work which has led to 
much valuable research. 

Formula (39) gives the theoretical return periods T{x) as a function of the 
reduced flood discharge y, and holds for the entire range of observations. The 
general numerical values are given in Table II, cols. 1 and 3. For a given stream, 
the return period of a flood discharge greater than x depends by (23) upon the 
two constants a and u. If these values have been calculated by (37) and (38) 
the theoretical flood discharge x corresponding to T{x) is obtained by the 
linear transformation 


(49) 


X = w H- y/a. 
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The asjrmptotic formula (42) suggests the coordination of the flood discharges 
X and the logarithm of the return periods. 

4. Rhdne and Mississippi Rivers. We think that our system of formulas is 
simple, logically consistent and free of artificial assumptions. Now it remains 
to be shown that the arithmetic involved is simple and that the results fit the 
observations. For the Rh6ne we shall analyze the observed cumulative fre- 
quency, the distribution, and the return periods. For the Mississippi River 
we shall limit ourselves to the return periods. 

For each year we choose the maximum of the daily discharges (we do not use 
momentary peaks). The 111 values Xm for the Rh6ne 1826 -1936 published by 
Coutagne [3] and arranged in order of increasing magnitude are given in Table III 
(col. 1). The supposition that the intervals between consecutive floods are all 
equal to one year is not always true. Only 77 of the 111 floods occurred between 
October and March, whereas 34 were scattered throughout the year. But the 


TABLE IV 
Calculation of constants 


Stream observation station . . . 


Rhdne Lyon 

Mississippi River 



(France) 

Vicksburg (Miss.) 



1826-1936 

1890-1939 

Number of observations 

. . .n 

111 

50 

Annual mean flood 

. . . ii 

2,493.6 

1,355.6 

Mean sc|uared flood 

. . . .1/2 

6,707,555.0 

1 ,961,828.8 

Standard deviation 

s 

703.1 

341.3 

Constant 

■•!/« 

548.2 

266.1 

Most probable annual flood . . . 

. . . .M 

2,177.0 

1,201.9 


differences in the lengths of the intervals compensate each other. The second 
column of Table III contains the serial number m. According to (9) we calcu- 
late for the mth observed flood discharge x ^ , taken in ascending magnitude, 
the logarithm of the observed return period log n/(n — m) (col. 3), where n = 111 
and m = 1, 2, ••• , 110, and obtain the exceedance intervals. The other 
observed curve, the recurrence interval, is obtained by (10) through the coor- 
dination of Xm+i and log n/(n — m). Both curves are plotted in Fig. 1. The 
recurrence and exceedance intervals differ for the large flood discharges. The 
observed flood discharges arranged in increasing magnitude are plotted in the 
cumulative histogram, Fig. 2. 

To compare these observations with our theory, we calculate the two con- 
stants 1/a and u according to the formulas (34)~(38). The values Sxm and 
SarJ, are given at the end of Table III. Division by n = 111 gives the mean 
flood u and the mean squared flood (Table IV). The Gaussian correction 
being 1 + 1/110 we obtain from formula (36) the standard deviation s (Table IV) 



TABLE V 


Observed and theoretical disiribtdione of flood dischargee 
Rhdne 


Reduced 

variable 

y 

Variable 

X \ 

Midpoints 
. Ax 
^ 2 

Observed 

distribution 

lllA'Cl(x) 

Theoretical 

distribution 

lllAQl(x) 

Cumulative 

frequency 

lllCKx) 

- 2.76 

- 2.50 

670 

807 

1 


0.00 

- 2.25 

944 



0.01 

0.01 

- 2.00 


1081 

1 

0.34 

0.07 

- 1.76 

1218 



1.19 

0.35 

- 1.60 


1355 

7 

3.03 

1.26 

- 1.26 

1492 



6.07 

3.38 

- 1.00 


1629 

5 

9.98 

7.33 

- 0.75 

1766 



14.02 

13.36 

- 0.60 


1903 

13 

17.38 

21.35 

- 0.25 

2040 



19.49 

30.74 

0.00 


2177 

21 

20.21 

40.84 

0.25 

2314 



19.68 

50.95 

0.50 


2451 

19 

18.26 

60.52 

0.75 

2588 



16.31 

69.21 

1.00 

• 

2725 

14 

14.14 

76.83 

1.25 

2862 



11.97 

83.35 

1.50 


2999 

9 

9.94 

88.80 

1.75 

3136 



8.16 

93.29 

2.00 


3273 

8 

6.61 

96.95 

2.25 

3410 



5.30 

99.90 

2.50 


3547 

6 

4.23 

102.25 

2.76 

3686 



3.46 

104.13 

3.00 


3822 

4 

2.66 

105.70 

3.25 

3959 



2.00 

106.78 

3.50 


4096 

2 

1.64 

107.70 

3.75 

4233 



1.28 

108.42 

4.00 


4370 

1 

1.01 

108.98 

4.25 

4507 



0.79 

109.43 

4.50 


4644 

0 

0.61 

109.77 

4.76 

4781 


j 

0.48 

110.04 

5.00 


4918 


0.38 

110.25 

5.25 

5055 


i 

1 

0.30 

110.42 

5.50 


5192 


0.23 

110.66 

5.76 

5329 


I 

0.18 

110.66 

6.00 


5466 


0.27 

110.73 




111 

111.00 
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and finally from (37) and (38) the constant 1/a and the most probable annual 
flood u. From the numerical values in Table IV the linear transformation (49) 
for the Rhdne is 


* = 2177.03 + 548.19y. 
TABLE VI 


Observed return periods 

Mississippi River, Vicksburg, (Miss.) (1890-1939) 


Flood 

disohArge 

Xm 

Serial 

number 

m 

Return period 
log'r(*.) 

Flood 

discharge 

Xm 

Serial 

number 

m 

Return 
period 
log' nx^) 

760 

1 

0.0088 

1357 

26 

.3188 

866 

2 

.0178 

1457 

27 

.3273 

870 

3 

.0269 

1397 

28 

.3566 

912 

4 

.0362 

1397 

29 

.3768 

923 

6 

.0458 

1402 

30 

.3980 

945 

6 

.0555 

1406 

31 

.4202 

990 

7 

.0655 

1410 

32 

.4437 

994 

8 

.0758 

1410 

33 

.4686 

1018 

9 

.0862 

1426 

34 

.4949 

1021 

10 

.0969 

1453 

35 

.5229 

1043 

11 

.1079 

1475 

36 

.5529 

1057 

12 

.1192 

1480 

37 

.5851 

1060 

13 

.1308 

1516 

38 

.6198 

1073 

14 

.1427 

1516 

39 

.6576 

1185 

15 

.1549 

1536 

40 

.6990 

1190 

16 

.1675 

1578 

41 

.7448 

1194 

17 

.1805 

1681 

42 

.7959 

1212 

18 

.1939 

1721 

43 

.8539 

1230 

19 

.2076 

1813 

44 

.9208 

1260 

20 

.2219 

1822 

45 

1.0000 

1285 

21 

.2366 

1893 

46 

1.0969 

1305 

22 

.2518 

1893 

47 

1.2219 

1332 

23 

.2676 

2040 

48 

1.3980 

1342 

24 

.2840 

2056 

49 

1.6990 

1353 

25 

.3011 

2334 

50 



== 67,780. 2ai - 97,591,440. 


This leads to the determination of the theoretical flood discharges. The theo- 
retical return periods log T{x) are given in Table II, col. 3 as a function of the 
reduced variable y and of x (col. 4). The discharges x obtained by letting 
y take on the A^lues —2.75 to 6.00 in the linear transformation, are given in 
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Table V, cols. 2 and 3 and plotted in Fig. 1, The distances Ax used in the 
calculations of the theoretical discharges are l/4a = 137.05. 

Along the abscissa are plotted the logarithm of the return periods and the 
return periods in years; along the ordinate are plotted the corresponding flood 
discharges and the modal annual flood u. The straight line from the point (a, 0) 
to the asymptote gives the most probable flood as a function of time. The 
theoretical curve corresponds quite closely with the general course of the ob- 
servations. For small floods the theoretical return periods are practically iden- 



Fig. 1. Rh6n]c at Lton (France) 1826-1936 
Observations Table III: Recurrence intervals, 4" “ Exceedance intervals, 

• •; Return periods, ; Theory Table II, cols. 3 and 4: Extrapolation, . 


tical with the observed values. But for the very large floods the theoretical 
curve surpassed both the exceedance and recurrence intervals. 

The observed cumulative histogram is shown in Fig. 2. We calculate from 
Table II, col. 2, the frequencies lllS9i(x) (Table V, col. 6). These theoretical 
values (x, 1112B(a;)) are also plotted in Fig. 2. The agreement between theory 
and observations is very good. 

For the comparison of the observed and theoretical distributions of the flood 
discharges we use what might be called the natural classification. For the 
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observations, the length of the class intervals and the beginning of the first class 
interval are arbitrary. In order to obtain the observed distribution of the flood 
discharges, it is natural to use the theoretical class intervals set forth in Table V, 
coL 2. The data of the third column can be interpreted as the midpoints of the 
class intervals given in col. 2. The frequencies for these class intervals are ob- 



Fig. 2. Cumulative Frequency of the Flood Discharges. RhCnb, Lton (France) 

1826-1936 

Observations Table III cols. 1 and 2, • — Theory Table V cols. 2, 3 and 6, / 


tainiKl from Table III, and are given in Table V, col. 4. The observed distribu- 
tion is shown in Fig. 3. To obtain the corresi>onding theoretical distribution we 
calculate from Table V, cfl. 6, the difference between two cumulative frequencies 
disjoined by one, i.e., we pair consecutivelj" the first and third, the second and 
fourth items and so on. This theoretical distribution given in col. 5 and the 
observed distribution are based on class intervals of the same length. Fig. 3 
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shows that the theoretical distribution A9S(x) of the largest values agrees m a 
satisfactory way with the observed distribution A'%B(x) of the flood dischai^eB. 
Table VI, col. 1, gives the corrected* flood discharges x« , measured in units of 
1000 cubic feet per second, for the Mississippi River at Vicksburg (1890-1939), 
(n 50), arranged according to increasing magnitude; col. 2 gives the serial 
number m. We calculate the logarithm of the observed return periods log 
n/(n — m), (col. 3). The observations (x* , log 'r(x«)) and (x*+i, log 'T{xm)) 
are plotted in Fig. 4. The constants obtained by formulas (34)-(38) are shown 



Fio. 3. DiBTBiBtrriON or thb Flood Discharges. RhCne, Lton (France) 1826 '1936 
Observations Table V cols. 2, 3 and 4, FI ; Theory Table V cols. 2, 3 and 5, r 

in Table IV. By (49) the theoretical floods x corresponding to the return 
periods T(x) presented in Table II, col. 3, are 

X = 1201.98 + 266.14y. 

These floods are given in Table II, col. 5. The class interval used is 

l/4a = 66.6. 

* These data have been put at my disposal through the courtesy of Mr. A. £. Brandt of 
the U. S. Department of Agriculture. 
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The theoretical curve («, log T{x))^ plotted in Pig. 4, agrees in a very satisfactory 
way with tiie observations. For the lai^e floods the theoretical return periods 
are between the exceedance and recurrence intervals. 

The calculations of the theoretical return periods for other streams, e.g. the 
Columbia, Connecticut, Cumberland, Rhine, and Tennessee Rivers, for which 
reliable observations exist for more than 60 years, also show a good agreement 
with the observations. The goodness of fit diminishes for streams for which 
the number of observations is smaller and for which the data are not very 
reliable. 



Fig. 4. Mississippi River at Vicksburg, (Miss.) 1890-1939 

Observations Table VI; Recurrence intervals, Exceedance intervals, 

• •; Return periods, ; Theory Table II, cols. 3 and 5; Extrapolation, . 

5. Summaxy and conclusions. In order to apply any theory we have to sup- 
pose that the data are homogeneous, i.e. that no systematical change of climate 
and no important change in the basin have occurred within the observation 
period and that no such changes will take place in the period for which extra- 
polations are made. It is only under these obvious conditions that forecasts 
can be made. 

The theoretical return period T{z), the mean number of years between two 
annual flood discharges greater than or equal to x, is a statistical function such 
as the distribution w{x) or the probabilities W{x) and P{z), There are two 
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sets of observed values corresponding to the theoretical set. The exceedance* 
interval 'T{zm) formula (9), and the recurrence interval "T{xm) formula (9'); 
Xm being the mth flood discharge, where m is counted from below. As any 
theory must include both notions, no separate theory for exceedance or recur- 
rence intervals is possible. 

The return period T(x) of a flood discharge x is found by formula (39). For 
large values of x the flood discharge converges toward a linear function (42) of 
the logarithm of the return period. This is the scientific basis of Fuller's em- 
pirical formula. The two constants of our formula u and 1/a, are, respectively, 
the most probable annual flood discharge and a multiple of the standard devia- 
tion (28), Their values depend upon the drainage basin and known geological 
and meteorological factors. It is beyond our present task to consider the influ- 
ence of these factors. Our method can be summarized by the following rules: 

1) For each yesir find the maximum daily discharge Xm (do not use momentary 
peaks) and arrange these n data in increasing magnitudes. 

2) Calculate for each discharge (w = 1, 2, • • • , n — 1), the values log 
'T{xm) = log n — log (n — m) and plot the curves Xm , log n/(n — m), and 
ajm+i , log n/(n — m). These are the observed exceedance and recurrence 
intervals. 

3) Calculate the annual mean flood u and the annual mean squared flood w*; 
determine according to (36)“(38) the standard deviation 


and the two constants 


1/a = 0.77970s, 

. 0.57722 

w = w — . 

a 

4) The theoretical flood discharges x corresponding to the logarithm of the 
return period T(x) given in Table II, col. 3, are obtained by the linear trans- 
formation 


X = u + yja 

where y is taken from Table II, col. 1. Plot a; as a function of log T(a:). For 
large values of x and for extrapolation it is sufficient to use the linear asymptote 
obtained graphically. 

The linear part of the theoretical curve (x, log T) permits of two interpreta- 
tions: First, T is the theoretical return period of a flood greater than or equal 
to x] second, x is the most probable flood to be reached within T years. The 
second interpretation holds for the straight line through the point (w, 0). 

The figures show a close agreement between observed and theoretical values. 
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The observed curvature of the return periods is brought out by the theoretical 
graph. 

The agreement between theory and observation is excellent for floods which 
correspond to reduced values of y ^ 3. For the two or three extreme floods, 
the return periods are based on a few observations and, consequently, the agree- 
ment is not very good. No theory can be verified by two or three observations. 
Generally speaking, the theory fits the observations as closely as could be ex- 
pected for such a complicated phdnomenon. 

In order to make a further test of our results, we need a numerical measure 
for the weights to be given to the theoretical points. Therefore, for a given 
probability we must find the corresponding theoretical limits for the observed 
return periods. The theory of positional values will give these control curves. 
Since it was the purpose of this article to develop and make clear the basic 
method, we have refrained from introducing this subject. 

It is our claim that the calculus of probabilities and especially the theory of 
largest values, is an eflSicient tool for the solution of certain hydrological problems. 
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ON TEE FOTnVDATIONS OF PROBABILITY AND STATISTICS' 

By R. von Miseb 
Harvard University 

1. Introduction. The tlieory of probability and statistics which I have been 
upholding for more than twenty years originates in the conception that the only 
aim of such a theory is to give a description of certain observable phenomena, 
the so called mass phenomena and repetitive events, like games of chance or 
some specified attributes occurring in a large population. Describing means 
here, in the first place, to find out the relations which exist between sequences 
of events connected in some way, e.g. a sequence of single games and the sequence 
composed of sets of those games or between a sequence of direct observations 
and the so called inverse probability within the same field of observations. The 
theory is a mathematical one, like the mathematical theory of electricity, based 
on experience, but operating by means of mathematical processes, particularly 
the methods of analysis of real variables and theory of sets. 

We all know very well that in colloquial language the term probability or 
probable is very often used in cases which have nothing to do with mass phe- 
nomena or rejx^titive events. But I decline positively to apply the mathemati- 
cal theory to questions like this: What is the probability that Napoleon was a 
historical person rather than a solar myth? This question deals with an iso- 
lated fact which in no way can be considered as an element in a sequence of 
uniform repeated observations. We are ail familiar with the fact that, e.g. the 
word energy is often used in every day language in a sense which does not 
conform to the notion of energy as adopted in mathematical physics. This 
does not impair the value of the precise definition of energy used in physici? and 
on the other hand this definition is not intended to cover the entire field of daily 
application of the term ^ergy. 

We discard likewise the scholastic point of view displayed in a sentence of this 
kind: ‘‘. . . that both in its meaning and in the laws which it obeys, probability 
derives directly from intuition and is prior to objective experience.^' This 
sentence is quoted from a mathematical pajier printed in a mathematical journal 
of 1940. The same author continues calling probability a metaphysical problem 
and speaking of the difficulties “which must in the nature of things always be 
encountered when an attempt is made to give a mathematical or physical solu- 
tion to a metaphysical problem." In my opinion the calculus of probability 
has nothing to do with metaphysics, at any rate not more than geometry or 
mechanics has. 

^ Address delivered on September 11, 1940 at a meeting of the Institute of Mathematical 
Statistics in Hanover, N. H. 
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On the other hand we claim that our theory, which serves to describe ob- 
servable facts, satisfies all reasonable requirements of logical consistency and is 
free from contradictions and obscurities of any kind. I am now going to outline 
the essential ideas of the theory as developed by me since 1919 and I shall have 
to refer as to the proof of its consistency to the recent work of A. H. Copeland, 
of J. Herzlx^rg and of A. Wald. Then I will give some examples of application 
in order to show how the theory works and how it applies to actual problems in 
statistics. 

2. The notion of kollektiv. The biisic notion upon which the theory is estab- 
lished is the concept of kollektiv. We consider an infinite sequence of experi- 
ments or observ’^ations every one of which supplies a definite result in the form 
of a number (or a group of numbers in the case of a kollektiv of more than one 
dimension). We shall designate briefly by X the sequence of results Xi , ^ 2 , 
Xa , • • • . In tossing a die we get foi* X an endless repetition of the integers one 
to six, X = 1, 2, • • • 6. If we are interested in death probability, we observe a 
large group of healthy 40 year old men and mark a onv for each individual sur- 
viving his 41st aniversary and a zero for (‘ach man who dies before, so that the 
sequence Xi , X 2 , Xa , • • • consists of zeros and ones. In a certain sense the 
kollektiv corresponds to what is called a population in practical statistics. Ex- 
perience shows that in su(4i sequences the relative frequency of the different 
results (one to six in the first of our examples, one and zero in the second) varies 
only slightly, if the number of experiments is large enough. We are therefore 
prompted to aasume that in the kollektiv, i.('. in the theoretical model of the 
empirical sequences or populations, each frequency has a limiting value , if the 
number of elements increases endlessly. This limiting value of frequency is 
called, under certain conditions which I shall explain later, the ^^probability of 
the attribute in question within the kollektiv involved.'' The set of all limiting 
frequencies within one kollektiv is called its distribution. 

Let me insist on the fact that in no case is a probability value attached to a 
single event by itself, but only to an event as much as it is the element of a well 
defined sequence. It happens often that one and the same fact can be considered 
as an element of different koUektivs. It may then be that different probability 
values can be ascribed to the same event. I shall pve a striking example of this, 
which w^e encounter in the field of actual statistical problems, at the end of this 
lecture. 

The objection has been made: Since all empirical sequences are obviously 
finite sequences, why then assume infinite koUektivs? Our answer is that any 
straight line we encounter in reality has finite length, but geometry is based on 
the notion of infinite straight lines and uses e.g. the notion of parallels which 
has no sense, if we restrict ourselves to segments of finite lengths. Another 
objection, often repeated, reads that there is a contradiction between the exist- 
ence of a frequency limit and the so called Bernoulli theorem which states that 
sequences of any length showing a frequency say J can also occur in cases for 
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which the probability equals But it has been proved, in a rigorous way ex- 
cluding any doubt, that the two statements are compatible, even by explicit 
construction of infinite sequences fulfilling both conditions. I would evenclaim 
that the real meaning of the Bernoulli theorem is inaccessible to any probability 
theory that does not start with the frequency definition of probability. 

Now we are in the position to explain how our probability theory works. 
This sequence of zeros and ones 

(X) 1 0 1 I 0 0 1 I 1 0 0 1 0 1 1 I 1 1 0 1 0 1 1 i 0 1 0 i 1 1 1 ... 

may represent the outcomes of a game of chance. The ones show gains, the 
zeros losses for one of the two players. If we separate the terms of X into groups 
of three digits and replace each group by a single one or zero according to the 
majority of terms within the group, we get a new sequence 

(X') 10 0 1110 1... 

whi(^h represents the gains and losses in sets of three games. Our task is now 
to compute the distribution, i.e. the limiting frequencies of zeros and ones in 
this new sequence X', assuming the two frequencies in X are known. A sequence 
can formally considered as a uniqiu^ number like a decimal fraction with an 
infinitf^ number of digits. Then the transition from X to X' can be called a 
tramformation of a number X^ == T{X). As our sequences have to fulfill certain 
conditions Cojxjland calls the sequences A", X' admissible numbers. What I 
just quot(‘d was of (^oui’se a very special example of a transformation of a number. 
But we have to tmiphasize that all problems dealt with in probability theory, 
without any (*x(;eption, have this unique form: The distribution or the limiting 
frequencies in certain sequences are given, other sequences are derived from the 
given ones by certain opc^rations, and the distributions in these derived sequences 
have to be computed. In other words: Probability theory is the study of trans- 
formations of admissible numbers, particularly the study of the change of distribu- 
tions implied by such transformations. 

We know four and only four, simple, i.e. irreducible transformations or four 
fundamental operations. They are called selection, mixing, partitioning and 
combination. By combining these basic processes we can settle all problems 
in probability theory. The formal, mathematical difficulties in carr 3 dng out the 
computation of the new distributions may become very serious in certain cases, 
particularly if we have to apply an infinite' number of transformations (asymp- 
totic problems). But, in the clearly defined framework of this theory no space 
is left for any metaphysical speculations, for ideas about sufficient reason or in- 
sufficient reason, for notions like degree of evidence or for a si^ecial kind of prob- 
ability logic and so on. And further no modification is needed for handling usual 
statistical problems: Terms like inverse probability, likelihood, confidence 
degrees, etc. are justified and admitted only as far as they are capable of being 
reduced to the basic notion of kollektiv and distribution within a kollektiv. I 
will give some more details to this point later. Meanwhile let me turn to a 



194 


R. VON MI8E8 


general question which, in a certain way, is the crucial point in establishing the 
new probability theory. 

3. Place selections and randomness^ It is obvious that we have to restrict 

still further the notion of kollektiv or the field of sequences which can be con- 
sidered as the objects of a probability investigation. The successive outcomes 
of a game of chance differ very clearly from any regular sequence as defined by a 
simple arithmetical law, e.g. the regularly alternating sequence 0 10 1 
0 1 0 1 • • • . A typical property which singles out the irregular or random 
sequences and which has to be reproduced in every probability theory is that, if 
p is the probability of encountering a one in the sequence, then p* is the prob- 
ability of two ones following each other immediately. Any probability theory has 
to introduce an axiom which enables us to deduce this theorem and others of a 
similar type. The question is only how to find a sufficiently general and con- 
sistent form for it. The procedure I have chosen consists in using a special kind 
of transformation of a sequence, which I call a place selection. 

A place selection is defined by an infinite set of functions Snixy , x* , • • • 
where xi , , xs , • • • arc the digits of an admissible number or a kollektiv and 

fin has one of the two values zero or one. Here «„ = 1 means that the nth digit 
of the sequence is retained, «n = 0 means that it is discarded. The decision 
about retaining or discarding the nth elements depends as you see, only on the 
preceding values xi , xj , • • • x„_i , but not on Xn or the following digits. Example 
of a place selection: 

fin =* 1, if Xn-i = 0 for prime numbers n, 
if Xn-i =* 1 for n not prime, 

fii = 1, and fin » 0 in all other cases. 

Experience shows that, if we apply such a place selection to the sequence X 
of outcomes of a game of chance, we get a new, selected sequence S(X) in which 
the frequencies of gains and losses are about the same as in AT. This fact or 
the practical impossibility of a gambling system suggests the adoption of the 
following procedure in handling transformations of admissible numbers. 

First, if within a certain investigation the transformation applied to is a 
place selection, we assume that the distribution in X' = S(X) is the same as 
in X: disir S(X) = distr X. Second, if a general transformation T is applied 
to X, say X' ^ T(X), then we examine whether the existence of a place selection 
S that changes the distribution in X' (so as to have distr S(X') ^ distr X') 
implies the existence of a place selection Si that would affect the distribution in 
X (so as to give distr 5i(X) = distr X). If this is the case, we say that X' is 
a kollektiv, provided that the original sequence X was considered to be a kollek- 
tiv. Take e.g. for X the sequence resulting from tossing a die endlessly, and 
call Pi , Pt , * * * Pe the limiting frequencies of the six possible outcomes 1,2, • * • 6. 
The transformation T may consist in replacing every 1 in the sequence X by a 
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2, every 3 by a 4, and every 6 by a 6. The new sequence consists of only three 
different kinds of elements 2, 4, 6 and therefore its distribution includes only 
three values pi , pi , pi where evidently pi *= pi 4* p* etc. Here it is almost 
obvious that if a place selection applied to X' changes the value of pi , the same 
selection if applied to X must change either pi or pj . So, if the original sequence 
X was considered as a kollektiv, X' has to be admitted too. 

Now the question arises whether this procedure is in itself consistent or 
whether it can lead to contradictions. We were concerned up to now with 
kollektivs the elements of which belong to a finite set of distinct numbers 

, es , • • • cjb and the distributions of which are therefore defined by k non- 
negative values Pi , Pa , • • • pjb with the sum 1. In this case it was pointed out 
by Wald and by Copeland that, if an arbitrary distribution and an arbitrary 
countable set 2 of place selections are given, there exists a continuum of se- 
quences every one of which has the given distribution, which is not affected by 
any place selection belonging to 2, Now it may be supposed that in a concrete 
problem a sequence X' is derived from a sequence X by a finite number of 
fundamental operations involving a finite set 2' of place selections. Another 
finite set 2" may consist of selections employed in establishing that certain 
sequences used in the derivation of X' are ‘‘combinable'' ones. Finally an 
arbitrary countable set 2 of selections S may be assumed. According to our 
procedure we have shown that to any place selection S which affects the distribu- 
tion in X' corresponds a certain jSi which, when applied to X, changes the dis- 
tribution of X. All these Si corresponding to the elements S of 2 form a 
countable set 2i. Now the set 22 including 2', 2", 2i and also including all 
products of two of its own elements is a countable set too. What we use in 
computing the distribution of X' is only the fact that the given sequence X is 
unaffected by the selections that are elements of 22 . It follows from the above 
quoted results that we can substitute for X a numerically specified sequence 
and carry out all operations upon this specified sequence. So it is proved that 
no contradiction can arise in computing the final probability according to our 
conception. 

I cannot enter here into a discussion of the more complicated case where the 
range within which the elements of a kollektiv vary, is an infinite one, either a 
countable set or a continuum. All principal problems connected with estab- 
lishing the notion of kollektiv can be settled satisfactorily, at any rate, by con- 
sidering those general forms of sequences as limiting cases of kollektivs with a 
finite set of attributes. 

4* Example: Set-of-gamea problem. I want to present now a simple, but 
instructive example to show how the theory works and what task a mathematical 
foundation of the calculus of probability has to achieve. Let us recall the two 
sequences X and X^ composed of zeros and ones of which we spoke above. The 
first represented the outcomes of a sequence of single games, the second the 
outcomes of triple sets of those games. If X is considered as a kollektiv with 
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given probabilities p and q for one and zero, it is easy to deduce the correspond- 
ing values p' and g' for X' and to show that X' is a kollektiv too. We begin by 
carrying out three selections which single out from the original sequence Xi , 
Xi f xa • • • first, the elements Xi ^ Xaj xj , • • • second, the elements XftXSfXBf • • • 
and third, the elements Xa , Xt ^ Xg y • • • . It can be shown by means of certain 
further place selections that these three kollektivs which we call Xi , X 2 , X| 
are combinable. That means that combining the corresponding elements of 
the three sequences like XiXbXb , xabXa , X 7 XaX 9 , • • • leads to a new three dimen- 
sional kollektiv Xo in which each permutation of three digits 0 and 1, has a 
probability equal to the corresponding product of p- and g'-factors. For in- 
stance the probability of encountering the group 111 is p* and for the group 110 
it is p^g. Now we operate a mixing upon Xo by collecting all permutations 
with two or three ones. We find in a well known way the sum p* + 3p*g for 
the probability p' of ones in the sequence X'. So far the result is very well 
known and can be reached — in my opinion, in a very incomplete and unsatis- 
factory way — also by the classical methods. 

But what I want to discuss here is a slightly modified question. If the 
sequence X means gains and losses for single games and if the arrangement for 
sets of three games is made as indicated before, then in a real play the gains 
and losses of sets arc counted in a different way. For, if the first two games of 
a set are both won or lost by the same player, the fate of the set is decided and 
there is no sense to play the third game. So the loss of the second set in our 
example will already be recognized after the fifth game and the actual sixth 
game will be considered as the first game of the third set. In this way the 
original sequence X decomposed into groups of two or three games 

(X) 1 0 1 I 0 0 I 1 1 I 0 0 1 0 1 1 1 1 1 1 0 0 I 1 1 I 0 1 0 1 1 1 I .. . 

leads to a new sequence X" 

(X") 1010110101 

which is obviously different from X'. Everyone familiar with the usual han- 
dling of the probability concept will say that in X" the probabilities of zeros and 
ones must be the same as in X\ But a mathematical foundation of theory of 
probability, if it deserves this name, has to clear up the question: From what 
principles or particular assumptions and by what inferences may we deduce the 
equality of the limiting frequencies in X' and X"? 

There is no difficulty in solving this problem from the point of view of the 
frequency theory. We have only to apply somewhat different place selections 
instead of the above used which lead to the kollektivs Xi , X 2 , Xa . I showed 
elsewhere how the general set-of-games problem can be satisfactorily treated in 
this way. Here I want to stress only that the problem as a whole is completely 
inaccessible by any of the other known approaches to probability theory. The 
classical point of view which starts with the notion of equally likely cases and 
rests upon a rather vague idea of the relationship between probability and 
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sequences of events does not even allow the formulation of the problem. In 
the so oaUed modernized classical theory, as proposed by Fr^chet, probabilities 
are defined as '^physical magnitudes of which frequencies are measures.^’ 
Fr^chet would say that the frequencies both in and in are measures of 
the same quantity. But why? We face here obviously a mathematical ques- 
tion which cannot be settled by referring to physical facts. It is clear that the 
equality of the distributions in the two sequences X' and X" is due to the 
randomness or irregularity of the original sequence X. No theory which does 
not take in account the randomness, which avoids referring to this essential 
property of the sequences dealt with in probability problems, can contribute 
anything toward the solution of our question. 

I have to make some special remarks about the so-called measure theory of 
probability.^ 

6. Probability as measure. Up to now we have been concerned only with 
the simplest type of kollektivs, namely, with those sequences the elements of 
which belong to a finite set of numbers so as to have a distribution consisting 
of a finite number of finite probabilities with the sum 1. It may be true that 
all practical problems, in a certain sense, fall into this range. For, the single 
result of an observation is always an integer, the number of smallest units 
accessible to the actual method of measuring. Nevertheless in many cases it 
is much more useful to adopt the point of view that the possible outcomes of an 
experiment belong to a more general set of numbers, e.g. to a continuous segment 
or any infinite variety. If we include the case of kollektivs of more than one 
dimension, we have to consider a point set in a A-dimensional space (where 
even k may be infinite) as the label set or attribute set of the kollektiv. In 
order to define the probability in this case we have to choose a subset A of the 
lal)el set and to count among the first n elements the number Ua of those elements 
the attributes of w^hich fall into A, Then the quotient : n is the frequency, 
and its limiting value for n infinite will be called the probability of the attribute 
falling into A within the given kollektiv. 

It was rightly stressed by many authors that in the case of an infinite label set 
some additional restrictions must be introduced. In particular A. Kolmogoroff 
8(it up a complete system of such restrictions. We cannot ask for the exist- 
ence of the limiting frequency in any arbitrary subset A, It will be sufficient 
to assume that the limit exists for a certain Kdrper or a certain additive family 
of subsets. If it exists for two mutually exclusive subsets A and B, the limit 
corresponding to A + B will be, by virtue of the original definition, the sum of 
the limits connectod with A and B. We can now insert a further axiom involving 
the complete additivity of the limiting values. So we arrive at the statement 

> What I call measure theory here is essentially that proposed by Kolmogoroff in his 
pamphlet of 1933. As to the new theory developed by Doob in his following paper (where 
instead of the label space the space of all logically possible sequences is used in establishing 
the measures) see my comment on page 215. 
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that probability is the .measure of a set. All axioms of Kolmogoroff can be 
accepted within the framework of our theory as a part of it, but in no way as a 
substitute for the foregoing definition of probability. 

Occasionally the expression probability as measure theory is used in a dif- 
ferent sense. One tries to base the whole theory on the special notion of a set 
of measure zero. One of the basic assumptions in my theory is that in the 
sequence of results we obtain in tossing a so called correct die the frequency, 
say of the point 6, has a certain limiting value which equals 1/6. A different 
conception consists in stating that anything can happen in the long run with a 
correct die, even that an uninterrupted sequence of six^s or an alternating se- 
quence of two^s and four^s or so on may appear. Only all these events which 
do not lead to the limiting frequency 1/6 form, together as a whole, a set of 
events of measure zero. Instead of my assumption: the limiting value is 1/6 
we should have to state: It is almost certain that a limit exists and equals 1/6. 
Nothing can be said against such an alluring assumption from an empirical 
standpoint, since actual experience extends in no case to an infinite range of 
observations. The only question is whether the asumption is compatible with 
a complete and consistent theory. 1 cannot see how this may be achieved. 
Before saying that a set has measure zero we have to introduce a measure system 
which can be done in innumerable ways. If e.g. we denote the outcome six by a 
one and all other outcomes 1 to 5 by zero, we get as the result of the game with 
a die an infinite sequence of zeros and ones. It has been shown by Borel that 
according to a common measure system the set of all 0, 1 sequences which do not 
have the limiting frequency J has the measure zero. In this way it turns out 
to be almost certain that the limiting frequency of the outcome six in the case 
of a correct die is Other values for the limit can be obtained by a similar 
inference. It is a correct but misleading idea that the measure zero is unaffected 
by a regular (continuous) transformation of the assumed measure system, since 
in our field of problems different measures which are not obtained from one 
another by a regular transformation have equal rights. So, saying that a certain 
set has the measure zero makes in our ease no more sense than to state that an 
unknown length equals 3 without indicating the employed unit. 

In recapitulating this paragraph I may say: First, the axioms of Kolmogoroff 
are concerned with the distribution function within one koUektiv and are 
supplementary to my theory^ not a substitute for it. Second, using the notion of 
measure zero in an absolute way without reference to the arbitrarily assumed 
measure system, leads to essential inconsistencies, 

6« Statistical estimation. Let me now turn to the last point, the application 
of probability theory to one of the most widely discussed questions in today’s 
statistical research : the so-called estimation problem. Many strongly divergent 
opinions are facing each other here. I think that the probability theory based 
on the notion of kollektiv is best able to settle the dispute and to clear up the 
difficulties which arose in the controversies of different writers. 
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We nmy, without loss of generality, restrict ourselves to the sknidest case 
of a single statistical variable x and a single parameter d, where x of course may 
be the arithmetical mean of n observed values. Here (and likewise in the case 
of more variables and more parameters) we have to distinguish carefully among 
four different kollektivs which are simultaneously involved in the problem. 
The range within which both x and & vary will be assumed to be a continuous 
interval so that all distributions will be given by probability densities. 

The first kollektiv we deal with is a one-dimensional one where the probability 
of X falling into the interval x,x + dx depends on x and on a parameter d. If 

(1) p(x I 0) 


denotes the corresponding density and the limits A, B within which x possibly 
falls depend on too, we have 


(10 



for each f». 


In order to fix the ideas we may imagine that the first kollektiv consists in 
drawing a number x out of an urn and that characterizes the contents of the 
um. Asking for an estimate of d implies the assumption that different possible 
urns are at our reach every one of which can be used for drawing the x. The 
values for the different urns fall into a certain interval C, D. It is usual to sup- 
pose that the urns are picked out at random so as to give another one-dimensional 
kollektiv with the independent variable Let poW be the probability of 
picking an urn with the characteristic value falling into the interval ^ + d^. 
This density 


(2) Po(i^) 

is often called the prior or a priori probability of d. As the range within which 
varies is confined by the constants C and D, we have obviously 

(20 f jhW « 1 . ^ 

Jc 

Now from these two one-dimensional kollektivs with the variables x in the 
first, in the second, we deduce by combination (multiplication) a two-dimen- 
sional kollektiv with the density function 


( 3 ) 


Pi^,x) = p0)-p{x I t>). 


The individual experiment which fonns the element of this third kollektiv con- 
sists of picking at random an um and drawing afterwards from this um. Both 
x and are now independent variables (attributes of the kollektiv) and it is easy 
to see that it follows from (1) and (2) 


nBW fO 

P(d,x) dxdfi = I p»(&) dd / p(x I ^) dx 
i(«) Jc Ja(4) 


(30 


1 . 
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We will return later to this two-dimensional kollektiv. Let us, first, derive 
from it, by applying the operation of partitioning (Teilung), our fourth and last 
kollektiv which is one-dimensional again. Partitioning means that we drop 
from the sequence of experiments which form the third kollektiv all those for 
which the x-value falls outside a certain interval x, x dx; and that in this 
way we consider a partial sequence of experiments with only the one variable <9. 
The distribution of t^-values within this sequence with quasi-constant x is given, 
according to the well known rule of division or rule of Bayes (a rule which can 
lie proved mathematically) by* 


(4) 


vM I x) 


= c(x) poW p(x 1 &). 


rp(d,x) 

Jc 




It follows immediately that 

(40 


f pi(d lx)da = 1 . 

Jc 


Thifl function pi of ^ depending on the parameter x is generally called the 
posterior or a posteriori probability of 

If pi(i> 1 x) can be computed according to the formula (4), every question con- 
cerning the ‘‘presumable” value of as drawn from the outcome x of an ex- 
periment is completely answered. We can find indeed, by integration the 
probability which corresponds to any part of the interval C, D of ^ and so the 
estimation problem is definitely solved. But the trouble is that in most cases of 
practical application nothing or almost nothing is known about the prior prob- 
ability poW which appears as a factor in the expression of pi . Hence arises 
the new question: What can we say about the lvalues mthout having any informa- 
tion about its prior probability f This is the estimation problem as it is generally 
conceived today. 

The first successful approach to the answering of this question was made by 
Gauss. If we do not know pi , we know however, except for a constant factor, 
the quotient pi/po , posterior probability to prior probability which equals 
cp{x 1 1 ^). The maximum of this quotient must be greater than one, since the 
average values of both po and pi are the same. So the maximum means the 
point of the greatest increase produced by the observed experimental value of x 
upon the probability of t?. It seems reasonable to assume the d-value for which 
the ratio pi/po reaches its maximum as an estimate for It is the value upon 
which the greatest emphasis is conferred by the observation. This idea, orig- 
inally proposed by Gauss in his theory of errors, has been later developed chiefly 
by R. A. Fisher, and is known today as the maximum likelihood method. Calling 
the ratio pi/po likelihood seems indeed an adequate nomenclature. 


* For brevity Bayes' rule is employed in the text as in the case of a discontinuous dis- 
tribution. The correct procedure in the case of a continuous x would require that we first 
use finite intervals and then pass to the limit. 
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The method of estimation used most frequently today is not the maximum 
likelihood method, but the so called confidence interval method, inaugurated 
by R. A. Fisher and now successfully extended and applied by J. Neyman. This 
method uses the third of the above mentioned kollektivs instead of the fourth, 
i.e. the two-dimensional probability P(d, x). At first sight it seems hopeless 
to use this function which includes the unknown prior probability po(t^) as a 
factor. But it turns out as Neyman has shown^ (and this is the decisive idea 
of the confidence interval method) that we can indicate in the x, t^plane special 
regions for which the probability // P(^, x) dx is independent of po(i>). In 
fact, if we point out for every ^ such an interval as to have 

' p(aj I d) dx * a, 0 < a < 1, 



it follows immediately from (2) and (5) for the region covered by these intervals 

P(d, x) dxd^ — / po(t^) dd / p(x I dx — a. 

For ^ven a the intervals can be chosen in different ways. If we choose Xi =* A 
for = C and Xa = P for = D, we get a strip or belt, as shown in Fig. 1 
which supplies for every given x a smallest value and a greatest value da . 
The definition of our third kollektiv leads to the conclusion: If we predict each 
time a certain x is observed that d lies between the corresponding di and da , then 
the probability is a that we are rights whatever the prior probability may be,^ It is 

* J. Neyman, Roy. Stat. Soc, Jour., Vol. 97 (1934), pp. 590-92. 

* After my lecture Dr. A. Wald called my attention to Neyman’s suggestion; namely 
that this statement can be generalized by admitting that the infinite sequence of i^'Values 
which results from picking out successively the urns for drawing a number x, does not 
fulfill the conditions of a kollektiv. So, instead of the terms ^‘whatever the prior prob- 
ability may be^' we can say * ‘whatever the method of picking out the urns may be.’* In 
fact, let us consider the case where 0 can assume only a finite number of values i^i , ^ 2 , * * * 
Ok . Among the n first trials let be the number of cases where 0 ^ and g n* the 
number of cases where ^ and x falls into the interval Xt(d«), xtCdg). The relative 
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understood that in this argument both x and are variables the values of which 
may change from one trial to the next. I cannot agree with the statement, 
which is often made, that x only is a variable and ^ a constant or that we are 
only interested in one specified value of In no way is it possible, in the 
framework of the confidence limits method, to avoid the idea of a so-called 
superpopulation, i.e. the existence of a manifold of urns every one of which forms 
a kollektiv.* Thus no contradiction and no antagonism exists between this 
method and the Bayes formula. Only a different kollektiv, a two-dimensional 
instead of a one-dimensional, is here considered. 

I have no time to enter here in a discussion of the very interesting develop- 
ments of Neyman’s theory which are intended to supply additional conditions 
in order to determine the arbitrary choice of the ^-intervals in a unique way. 
May I only mention that what is called in Ncyman^s theory the probability of a 
second type error in testing the hypothesis = t^o is given by the expression 

P(d, x) dx dd = / poW dd / p(x 1 1^) dx. 

If we want to determine the confidence belt or the intervals Xi , Xt in such a way 
as to minimize this expression independently of the function po(t^), we obtain 
Neyman^s maximum power condition 

(8) / p(x 1 1 >) dx « F{<9, tfo) = min. for each pair d, 

This condition, it is well known, cannot be fulfilled under general assumptions 
for p(x 1 ^). Moreover the above-mentioned boundary conditions Xi(C) = 
-4(C) and Xf(D) = B{D) (or similar ones in other cases) have to be considered 
too. If they are not satisfied, the statement which can be made with probability 
a would include the prediction that certain x-values are impossible. Except 
for this case the above formulated theorem is equally valid for every region 
determined according to (6). 

It is clear that if the original distribution is given by a regular, slightly vary- 
ing function p(x 1 1 ^), the confidence limits method cannot give very substantial 
results. Let us take e.g. for p(x | d) the uniform distribution 

(9) p(x 1 1 ^) = 1/i^ for 0 ^ X g 0%^ ^ 1. 


frequency of correct predictions is then (n| •+* •+•••• n^); » where n equals ni+ni-f 

* * * fi* . If n tends to infinity, at least one part of the must become infinite. For those 
the limit of nettle tends to a according (6) while the other terms (with finite n« and n.) 
have no influence. So the limiting value of the frequency (n( -f 4- • • * nl): n equals 
in any event a. This generalization does not apply, if we ask for the probability of a second 
type error of the hypothesis . Here the existence of the prior probability po is 

essential. 

* According to the generalization supplied by Neyman’s point of view (Phil. Tram. 
Roy. Soc., Vol. A-236 (1937), pp. 339-380) which is discussed in footnote 5, the superpopu- 
lation does not necessarily satisfy the conditions of a kollektiv. 
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We have here A — 0, C“0, D-l and the domain in whidt x and 

vary is the 46“ right triangle shown in Fig. 2. Whatever ?#(#) may be, the 
integral of p(d, x) po(^)*p(z | d) over this domain is 1 and if we omit the 
part of the triangle on the left of the straight line * = (1 — a)d, the integral 
over the remaining part is a. For a = 0.90, a statement which can be made 
with a probability of 90% reads: The value of d lies between x and 10®. On 
the other hand we know from the very beginning with 100% certainty that d 
lies between x and 1, so that for x ^ 0.1 the statement is futile. (If one chooses 
as confidence belt the part on the left of the straight line ® = ad, the statement 
would run: d lies between 1.1 x and 1 and values of x greater than 0.9 are 
impossible.) If we apply in this case the Bayes formula, we find that the out- 
come depends to the highest extent on what is known about the prior prob- 
ability po(d). 

In most cases however which present themselves in practical statistics the 
original density function p(® | d) has a different character from that assiuned in 



(9). It depends generally on an integer n and the distribution is concentrated 
more and more when n increases. (We may define here concentration as 
standard deviation tending towards zero. The_,integer n means in general the 
niunber of basic experiments). We have e.g. in the so-called Bayes problem 
where x is the arithmetical mean of n observations the asymptotic expression 
for p: 


( 10 ) 


0 ^ d g 1, 0 Jg ® g 1. 


If we denote by 9 the probability integral 

(11) «(®) = A du, 

the ®-interval8 corresponding to a given probability value a are defined by 


(12) *1 - d - 


*1 


«+( ,i«„*({y^I^) 


a. 
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If n has a large value, the £’s are very small and we get a narrow belt along the 
straight line x = & aa shown in Fig. 3 for a = 0.90 and n about 100. The 
prediction which can be made with the probability a reads approximately 

(13) + , where 

On the other hand it is well known that in this case the Bayes formula supplies 
a posterior probability pi(t> | x) which turns out to be more and more independent 
of the prior probability po(t>) when n increases. It has been shown that the 
asymptotic expression for pi(d | x) whatever po(d) may be, is 

(14) 

It follows that, on the basis of the Bayes formula, we can predict for every 
single value of x with the probability a that d lies between the above given 





limits (13). This is more than the confidence limits method supplies, but the 
result is subjected to the restriction that poW is a continuous function. How- 
ever, for large values of n (generally this means for large niunbers of basic ex- 
periments) the outcomes of both methods are essentially the same. 

Let me recapitulate in three brief sentences the essential results we have 
found in the problem of estimation. 

1. There is no contradiction of any kind between the Bayes formula and the 
confidence limits method and no difference at all in the underlying probability 
concept. In both methods the idea of a sort of “super-population” is used. 
Only two different kollektivs are considered in both cases. 

2. If the original distribution has a regular, slightly varying density function 
p{x 1 d), the Bayes method gives a complete answer when the prior probability 
is known and no answer when it is unknown. The confidence limits method gives 
in bpth cases a definite solution; it lies in the nature of things that the solution 
cannot be very substantial if p(x, &) is only slightly varying. 
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3. If the original distribution p{x | d) depends on a further parameter n and 
becomes concentrated more and more with increasing n, both approadies give, 
for large n, asymptotically about the same results. 

It is not intended by these remarks to impair the value of the confidence 
limits method which both from theoretical and from practical point of view 
deserves our attention. But the rather inconceivably aggressive attitude 
towards the Bayes’ theory as displayed by a number of statisticians, which, 
however, does not include J. Neyman, turns out to be completely unfounded. 



PROBABILITY AS MEASURE 

Bt J. L. Doob 
University of Illinois 

The following pages outline a treatment of probability suitable for statisti- 
cians and for mathematicians working in that field. No attempt will be made 
to develop a theory of probability which does not use numbers for probabilities. 
The theory will be developed in such a way that the classical proofs of proba- 
bility theorems will need no change, although the reasoning used may have a 
sounder mathematical basis. It will be seen that this mathematical basis is 
highly technical, but that, as applied to simple problems, it becomes the set-up 
used by every statistician. The formal and empirical aspects of probability 
will be kept carefully separate. In this way, we hope to avoid the airy flights 
of fancy which distinguish many probability discussions and which are irrelevant 
to the problems actually encountered by either mathematician or statistician. 

We shall identify as Problem I the problem of setting up a formal calculus to 
deal with (probability) numbers. Within this discipline, once set up, the only 
problems will be mathematical. The concepts involved will be ordinary mathe- 
matical ones, constantly used in other fields. The words “probability,” 
“independent,” etc. will be given mathematical meanings, where they are used. 

We shall identify as Problem II the problem of finding a translation of the 
results of the formal calculus which makes them relevant to empirical practice. 
Using this translation, experiments may suggest new mathematical theorems. 
If so, the theorems must be stated in mathematical language, and their validity 
will be independent of the experiments which suggested them. (Of course, if a 
theorem, after translation into practical language, contradicts experience, the 
contradiction will mean that the probability calculus, or the translation, is 
inappropriate.) 

The classical probability investigators did not separate Problems I and II 
carefully, thinking of probability numbers as numbere corresponding to events 
or to hypothetical truths, and always referring the numbers back to their 
physical counterparts. The measure approach to the probability calculus has 
put this approach into abstract form, and separated out the empirical elements, 
thus removing all aspects of Problem II. We shall explain this approach first 
in a simplified set-up, that which will be made to correspond (Problem II) to a 
repeated experiment in which the results of the nth trial can be any integer Xn 
between 1 and N (inclusive), in which the experiments are independent of each 
other, and performed under the same conditions. (The set-up will be applicable, 
for example, to the repeated throwing of a die.) 
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The measure approach treats this experiment as follows. Let ta : (xi p X 2 > - * • ) 
be any sequence of integers between 1 and JV, inclusive. e consider o) as a 
point in an infinite dimensional space Q. (Each point u> may be considered as a 
logically possible sequence of results of the given experiment^ and this fact will 
guide us in solving Problem II.) A measure function is, defined on certain sets 
of points of 12 as follows. Let Pi , • * • , be any numbers satisfying the 
conditions 

P, ^ 0, i ^ 1, Pi +••• + Pat = 1. 

(How these numbers are chosen in any particular problem will be explained 
below. The method of choice is irrelevant to the mathematics, but is involved 
in the solution of Problem II.) The set of all sequences beginning with Xi ^ a 
is given measure p^ . More generally, the measure of the set of all sequences 
beginning with Xi = ai , • • • , Xn = «« , is defined as Pai Poi • • • pa« . In this 
way, as can be shown, ^ a completely additive measure function is determined 
on certain point sets of 12, on a field 5 of sets so large that all the usual Lebesgue 
measure and integration theory is applicable. This means that there is a col- 
lection 5 of sets of points of 12 such that if <Si , 5* , • • • are finitely or infinitely 

many sets in the collection, their sum ^ Sn , their intersection U Sn , and 

1 1 

their complements are also in the collection. Each set S in 5 has a definite 
measure P(S), 0 g P(S) S 1, and if Si , , • • • are finitely or infinitely many 

disjunct sets in 

P(Si + + • • • ) = P{Si) + P{St) + • ‘ . 

Problem II, the translation problem, is solved as follows. Each relevant 
event is made to correspond to a point set of 12. A relevant event is a physical 
concept — defined by imposing some set C of conditions on the results of the 
experiments. The corresponding 12-8et is the set of sequences (xi , x* , • • • ) 
satisfying the same set C of conditions, imposed on the x> . Thus the set of all 
sequences beginning with xi = ai , X 2 = a* , is made to correspond to the event: 
the result of the first experiment is ai , of the second is a 2 . As is to be expected, 
the mathematical picture goes further than the real one. The “event^' 1 occurs 
infinitely often in a sequence of trials has only conceptual significance, physically, 
but the corresponding point set of 12: the set of all sequences (xx , X 2 , • • • ) con- 
taining infinitely many Ps, is a perfectly definite point set whose measure can 
be calculated in terms of pi , • • • , p^^r . (In fact it is easily seen that this 
measure is 1 or 0, according as pi > 0 or pi * 0.) By ‘The probability of an 
event'' we shall mean the measure of the corresponding 12-set. As this measure 
has been defined, the probability that the nth trial results in a number j is p,* , 
and the probability that one trial results in j, and another in fc, is PrPt • 

^ Cf. A. Kolmogoroff, Ergebniase der Mathematik^ Vol. 2, No. 3, Orundbegriffe der Wahr- 
scheinliehkeitBrechnung, where the most complete treatment of the approach to the proba- 
bility calculus from the standpoint of measure ia given. 



208 


J. L. DOOB 


The justification of the above correspondence between events and Insets is 
that certain mathematical theorems can be proved, filling out a picture on the 
mathematical side which seems to be an approximation to reality, or rather an 
abstraction of reality, close enough to the real picture to be helpful in prescribing 
practical niles of statistical procedure. The following two theorems are im- 
portant ones, from this point of view. These two theorems depend in no way 
on ^served facts. They are stated and proved in the customary language of 
modern analysis. 

Theorem A: Let jn be the number of the first n coordinates of the point 
, X 2 , • • • ) which are equal to j, where j is some integer (1 ^ j ^ N) which 
will be kept fixed throughout the discussion. Then 0 ^ in ^ n, and in varies from 
point to point on 12 :in = in(w) is a function of w, that is of the sequence (a;i , , • • • ) • 

When n 00 , j^/n has not a unique limit independent of the sequence 
(xi fXi j • • • ) imder consideration. In fact if w is the point (A:, k, • • • ),jnM = 0 
for all n, unless j = A; if w is the point (j, j, • • • ), in(«) — n for all n. It is 
simple to give examples of sequences «:(a:i , X 2 , • • • ) for which in(«) oscillates 
without approaching a limit, as w — ► oo . But Theorem A (usually called the 
strong law of large numbers) states that there is a set of sequences, i.e. an w-set S, 
of measure 0, such that 



!• 

n-»ao n- 


imless w is in S. In other words the sequences for which (1) is not true are 
exceptional in the sense of measure theory. If a new choice {p< ) of p/s is made, 
. then if py py , the new exceptional set includes all the sequences which were 
not exceptional before, since the limit in (1) becomes py , Thus S depends 
essentially on py . Theorem A is a generalization of Bernoulli’s classical theo- 
rem which states in our language that the measure of the set of sequences 
, X 2 y • • • ) for which 


|y„(«)/n - P#1 > < 


approaches 0, as « , for any positive e. Theorem A is stronger because it 
states that there is actual convergence, whereas Bernoulli’s theorem only con- 
cludes that there is a kind of convergence on the average. 

Theorem A corresponds to certain observed facts, relating to the clustering 
of ‘‘success ratios,” giving rise to empirical numbers py . If the statistician 
wishes to apply his calculus to a given experiment (Problem II), he sets p,* == py . 
There has been frequent discussion of the problem of determining the py . 
This discussion of the py is sometimes held on so high a plane that the innocent 
bystander may wonder to what purpose such abstract philosophic concepts could 
possibly be put — besides that of stimulating further discussion on a still higher 
plane. The principle purpose of this paper is to discuss Problem I, but a few 
words on Problem II might not be out of place here. Almost everyone who is 
going to use probability numbers, the p,- , far other than conversational purposeSy 
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derives them in the same way. There is a judicious mixture of experiments 
with reason founded on theory and experience. Thus if a coin is tossed by an 
experimenter who has examined the coin, and found that it had heads on one 
side but not on both, that it seemed balanced, and that (as a confirming check) 
tossing a hundred times gave around 50 heads, the experimenter would use ^ 
as the probability of obtaining heads in his further reasoning. Of course ^here 
is no logic compelling this. The experimenter may have been fooled. A coin 
far out of balance may turn up 60 heads in 100 throws. But man must act, 
and the above procedure has been found useful, which is all that is desired. In 
many experiments, less reliance can be placed on a preliminary physical examina- 
tion of the experimental conditions, and more must be placed on the actual 
working out of the experiment, as in the analysis of machine products. In that 
case, the actual results must be examined with great care, before attempting 
to use the above mathematical set-up. It sometimes may even be possible to 
change the experimental conditions to make the mathematics applicable.^ In 
all cases, such mathematical theorems as Theorem A and the following Theo- 
rem B give the basis for applying the formal apparatus to practice. Indeed, 
the criterion of application includes the verification of special cases of the prac- 
tical versions of Theorems A and B. 

Theorem B: Let fn(xi , • • • , Xn^i) (n > 1) he any function of the indicated 
variables,exceptthatwesuppo8e/nOnly takesonthevaJuesO, 1. Let«: (xi,a;8, • • •) 
be a given point of 12. Let n' be the number of the first n integers i such that 
fi(xi , • • • , Xi^i) = 1, and let be the number of the first n integers i such that 
f%{xi , • • • , Xi^i) = 1, and Xi == j. Then jn , w' are fimctions of«:(a:i,Z 2 ,'*')- 
if/i ••• = jn jU* — Uy where jn is as defined above. Suppose 

that there is an li-set So of measure 0 such that n' — ♦ « , as n — ► « , unless w € S. 
Theorem B states that there is then an ll-set S' of measure 0, such that if 
co: (xi , x* , • • • ) is not in S', 


( 1 ') 



Pi- 


(The set S' will depend on the given functions /i , / 2 , • • • and on the , but is 
fixed, once these have been chosen.) This mathematical theorem corresponds 
to certain observed facts (usually summarized by stating that no (successful) 
system of play is possible). In fact, it states, in the language of practice, that 
rejecting certain trials, using as a criterion of acceptance or rejection the results 
of preceding trials, rejecting the ith trial if /,(xi , • • • , x,-i) = 0, does not affect 
the outcome of a game of chance, or, more precisely, does not affect the validity 
of the physical fact corresponding to Theorem A. If /i e£ /2 a* • . . s 1, (!') 
becomes (1). The hypothesis that n' — + « as n — ► <*> unless « € is made to 
insure that infinitely many trials will be accepted. As an example of the 


’ Cf . W. A. Shewhart, Statistical Method from the Viewpoint of Qitality Control^ Wash- 
ington, 1939. 
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possible variety in the definition of the /• , we might define as 1 if Xi^i =» N, 
and ft » 0 otherwise, so trials are accepted only if the previous trial resulted in 
the number N. Or much more complicated systems can easily be devised in 
which the criterion of acceptance of the nth trial depends on a varying number 
of the results of preceding trials. This theorem gives a mathematical coimter- 
par^to the physical idea of the mutual independence of repeated trials. 

To summarize, mathematically (Problem I) the study has been reduced to 
that of the measure properties of Q. This can be considered independently of 
any physical correspondence. The physical correspondence (Problem II) makes 
any event 6 correspond to a point set E of 0, the ^‘probability of 6^’ becomes 
the measure of E. Thus “the probability that the result of the first experiment 
is 3^^ becomes the measure of the set of sequences (xi , X2 , • > ^ ) beginning with 
Xi = 3. We have given no sharp definition of probability as a physical concept 
If the above mathematical set-up, after translation, using some set of p/s, 
seems to fit a given physical set-up, any event will be said to have as its proba- 
bility, the measure of the corresponding Onset. We have attempted to give no 
intrinsic a priori definition of the probability of an event; such a definition is 
quite unnecessary for our purposes. All that was required was a basis for pre- 
scribing the usual statistical procedures, and we have described such a basis. 

In the above example, there would have been no new diflSculty introduced 
if the Xn were not restricted to integral values, but allowed to take on any 
numerical values. The general point ca:(a:i , a; 2 , • • • ) of would now be any 
sequence of real numbers. Instead of choosing the numbers pi , * • • , we 
choose a “distribution function'' a monotone function with the following 
properties: 

lim F{x) = 0, lim F{x) = 1, F{x — 0) == F(x). 

Measure on 11 is defined as follows. The set of all sequences beginning with Xi 
such that a ^ xi < b 18 given measure F{b) — F{a). (The number F{b) is 
called “the probability that Xi < 6.") More generally, the measure of the set 
of all sequences (xi , , • • • ) beginning with Xi, ••• , Xn , such that o, g 

X, < 6, , j 1, • • • , n is defined as H [F(6,) — F(a;)]. Thus if F(x) defines a 

simple rectangular distribution: F{x) = 0 for x < 0, F(x) *= x for 0 x S 1, 
F{x) « 1 for X > 1, ll-measure becomes (infinite dimensional) volume in the 
(infinite dimensional) unit cube. The correspondence (Problem II) between 
events and point sets of U is defined just as before. Sometimes it may be useful, 
in considering experiments giving rise to pairs of numbers, to let each Xn be a 
pair of numbers so that Q becomes a sequence of points of a plane instead of a 
sequence of pK)ints of a line. In all cases there are mathematical theorems 
true of the resulting U which guide us (Problem II) in deciding just how the 
H-measure is to be defined, that is, how F(x) is to be defined, in dealing with a 
given practical problem. But the essential point is this. Once Q-measure has 
been defined, no changes or further hypotheses are possible or necessary. All 
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relevant probability questions are answerable. Thus consider a question of the 
following type: if tiie experiments are grouped in some way,' with what proba- 
bility will the groups have some given regularity property?* The question singles 
out a set of sequences of Q and asks: what is the measure of EJ The problem 
may or may not be difficult mathematically,' depending on the grouping, but 
the original definition of measure on Q needs no enlargement to answer it. 

Technically, the mathematics has become the mathematics of a special type 
of measure defined on a space of infinitely many dimensions. If, however there 
is an integer v such that only at most v experiments are to be considered, we 
need only consider the iKlimensional space of points (xi , • • • , x,), defining 
measure in this space in the same way as on Q. Thus if Xn has the rectangular 
distribution defined above, the measure in (zi , • • • , xO-space becomes ordinary 
v-dimensional volume in the unit cube. Perhaps the most common measure a 
statistician considers is that in which the measure of an (xt , • • • , x,)-8et E 
becomes “the probability that the point (xi , • • • , x,) representing an inde- 
pendent sample of v from a normal distribution of mean 0 and variance a* ” 
will lie in E: 

(2) P{E] = <r"''(2T)~‘’’ c-***?-* ••+*;»"’ dxi ... dx,. 

This example makes it obvious that the statistician is always doing measure 
theory, even though he may not state that fact explicitly. If the number of 
experiments has no upper boimd conceptually — ^mathematically when the num- 
ber of dimensions p may increase without limit, as in Theorems A, B, it is much 
more convenient to use the space in terms of which experiments with varying 
numbers of trials can be considered simultaneously. The classical proofs of 
probability theorems, such as Bernoulli's theorem (the law of large numbers) 
are perfectly correct. If the ^‘probability of an event^' is interpreted as the 
measure of a set, these proofs do not eve^ need verbal, changes. There can be 
no question of the need for any axiomatic development beyond that necessary 
for measure theory, and the probability calculus can lead to no contradiction, 
unless the theory of measure is faulty. 

It is (customary for probability theorists to stop their discussions when the 
present stage is reached, so that the beginnings of a formal calculus have been 
constructed to deal with a repetition of independent experiments, conducted 

* A grouping is neoessary, for example, when two playen are playing a game in which 
two out of three wine in the trials win a game. The trials are then grouped into successive 
groups of two or three, depending on how they come out. 

* Continuing the preceding note, the question might be: will the ratio (games won by 
player a)/(game8 played) approach a limit with probability 1, that is, for all of the original 
sequences |xw) except possibly some forming a set of measure 0? 

* The answer to the question of the preceding notes is simple. If p is the probability 

that player a wins a trial, the ratio in question approaches p* + 3p*(l p), the probability 

that a wins a game, with probability 1. 
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under the same conditions. Perhaps this is because of the following widely held 
syllogism: probability is something dealing with random events; random events 
are events haAong no influence on each other; therefore .... Unfortunately 
mathematicians and statisticians must deal with many problems involving de- 
pendent probabilities, whose solutions require the most delicate and careful 
applications of modern analysis. The rudimentary calculi which the outsiders 
find esthetically or philosophically pleasing arc usually either insufferably awk- 
ward or completely insufficient for the needs of professionals. There is a strange 
situation, which one observer has facetiously described somewhat as follows: it 
is true with probability 1 that the technical workers in probability use the 
measure approach, but that the writers on “probability in general” descendants 
of Carlyle's professor, do not consider this approach worth much more than a 
passing remark.* The following pages outline how our previous treatment is 
generalized to deal with problems in which it is desirable to have the distribution 
of Xj vary with j (so that physically the experiments are no longer the same), 
and in which the x,- do not have to correspond to the results of independent 
experiments. Some attempt will also be made to show how the modem mathe- 
matical theory of real functions is applied to the probability calculus. 

Let Xj = x,(w) be the ./th coordinate of the point «: (xi , X 2 , • • • ). Then as 
the sequence «: (xi , Xj , • • • ) varies, x,- does also: x,(w) is a function of w. The 
functions Xj(a>), Xs(w), • • • are functions defined on il, an abstract apace on which 
a measure has been defined. Moreover U-measure has been defined in such a 
way that the fl-set for which x,(w) < K (j, K fixed) is an il-set whose measure 
has been defined. (This set is compostd of all sequences (xi, Xt, • - ■ ) whose 
yth coordinate is <iiL, and the measure is F{K), using our last definition of 
Sl-measure.) In the terminology of measure theory, x,(«) is thus a measurable 
function. The study of the measure relations of Q, and this is the whole of our 
probability calculus, can be considered, from this point of view, as the study of 
the properties of a sequence of measurable functions, one with very special 
properties, as we shall see, defined on some space. A measurable function 
defined on Q is usually called a chance variable, in the theory of probability. 
(This terminology is somewhat dangerous, because it mixes Problems I and II.) 
The whole apparatus of modem real variable theory is applicable to these 
chance variables. Thus if /(«) is a chance variable (measurable fimction of w) 
(physically, a function of the observations), it is customary to define a number 
called its expectation. This number is simply the integral of /(w), with respect 
to the given fi-measure. The fact that the expectation of the sum of two chance 
variables is the sum of their expectations is simply the familiar theorem that the 
integral of the sum of two functions is the sum of their integrals. Let S(J, K) 
be the ll-set defined by the inequality x# < K. Up to now we have supposed 

'This analysiB, like every other probability statement, is only an approximation to 
reality, but a fairly close one. 
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that the meaeure of S(j, K) is independent of j, that is that the distribution of x,- 
is independent of j. We have also supposed that' 

(3) P{S(1, Kx)... Sin, iJ-n)} = P{Sil, m . . . P{S(nfii:«)} 

for any positive integer n, and numbers Ki, • • • ,Kn. That is, we have sup- 
posed that xiita), xtiu), • • • are mutually independent chance variables * In 
fact probability measure on Q has been defined just to make the foregoing two 
facts true. Mutual independence is a very strong hypothesis to impose on a 
sequence of functions. In many probability problems (Markoff chains for 
example), more general measures must be defined on ft. The sequence *!(«), 
Xtiu), • • • whose properties arc those of ft-measure, is then no longer a sequence 
of independent functions, and the distribution of x, can vary with j. 

At this level, the study becomes the sttidy of any sequence of measurable 
functions, defined on some space of total measure 1. If f, g are given chance 
variables, they may turn out to be independent. In that case the theorem that 
the expectation of their product is the product of their expectations becomes, 
when translated into mathematical language, the familiar theorem that 

/ j f(3:)giy) dxdy = j fix) dx J giy) dy. 

The matlieraatical theoremn are not simply analogues of the probability theo- 
rems — they themselves are those theorems. When stated mathematically, the 
probability theorems need no proof: they need only recognition as standard 
results. 

Empirical needs suggest that certain functions called conditional probability 
distributions, and conditional expectations, should be defined in a certain way. 
This is possible, as a formal matter,® and the theorems then proved about these 
functions gives them their usual meaning when translated into practical language. 
These functions are extremely useful tools in dealing with mutually dependent 
(that is not independent) chance variables. 

The above approach is easily generalized to tlie stage needed in the study of 
Brownian movements or of time series, in which, instead of the proper initial 

^ P [8] was defined as the measure of the U-set S. 

* The n chance variables /i(«), • • • , /n(w) are said to be independent if for every 

set of n numbers X i , • • • , , the following equality is true. 

P\jM<Ki, j-1, .-.n) 

where P{ * * * ) denotes the IV-measure of the U-set defined by the conditions in the braces. 
Thus in the example of a normal distribution in v dimensions given above, Xi , • • ^xv 
are independent functions on the space of v dimensions, a fact which follows readily from 
the fact that the I'-dimensional density function is the product of v functions of the separate 
variables. 

* Cf. Kolmogoroff, loc. cit. 
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abstraction being a sequence Un} of numbers, we have a one<parameter family 
fz(} {t takes on all real values). The number Xi may, for example, be thought 
of as the z-coordinate of a particle at time t. There is no difference in principle 
here: Q is now the space of functions of t, instead of the space of sequences, that 
is functions of n. From the other point of view, instead of studying the proper- 
ties of a sequence of measurable functions, it becomes necessary to study the 
properties of a one-parameter family of measurable functions. 



DISCUSSION OF PAPERS ON PROBABILITY THEORY 

By R, von Mises and J. L. Doob 

1. Comments by R. von Mises. Professor Doob outlines a new theory of 
probability starting with the following three basic conceptions. First, he uses 
the notion of an infinite sequence of trials or better: of an infinite sequence of 
numbers Xi^ ••• which can be considered as the outcomes of infinitely 

repeated uniform experiments. Second, he introduces (in his Theorem A) the 
limit of the relative frequency of a particular outcome a. Third, (in his Theo- 
rem B) the notion of place selection defined by a sequence of functions 
fn{xi , Xi , • • • Xn^i) is employed. All these three concepts are completely 
strange to the so called classical theory as developed by Bernoulli, Laplace, 
Poisson, etc. They have been introduced and made the comer stone of proba- 
bility theory in my papers published since 1919. I daresay that in no probability 
investigation before 1919 any of those notions even were mentioned. 

This concerns what Professor Doob calls the Problem I or the purely mathe- 
matical aspect of the question. As to his Problem II or the relationship between 
the formal calculus and real facts Professor Doob stresses that the actual values 
for probabilities that enter as data into a particular argument have to be drawn 
from long, finite sequences of experiments. This is in" complete accordance 
with the standpoint of my theory and in strict contradiction to the classical 
conception which knows only ^^a priori^' probabilities determined by ‘^equally 
likely cases.” 

In both theories. Professor Doob’s and mine (not in the classical) a mathe- 
thematical model or picture is associated with a long sequence of uniform 
expenments. These models are different in both theories. My model (the 
‘^kollektiv”) consists of one infinite sequence Xi, Xi , , • - • in which the 

limit of the relative frequency of each possible outcome a exists knd is indifferent 
to a place selection; the value of this limit is called the probability of a. 

On the other hand Professor Doob’s model implies all logically possible se- 
quences which form a space Q and he shows that in this space a measure function 
can be introduced which fulfills the following conditions: (1) If m is a positive 
integer, the set of all sequences the mth element of which is a has a measure pa 
independent of m; (2) the set of all sequences in which the relative frequency 
of o-results has either no limit or a limit different from pa is zero; (3) if iS is any 
place selection, the set of all sequences w for which the relative frequency of a 
in S(w) has either no limit or a limit different from p« is likewise zero; this value 
Pa is called the probability of the outcome a. It then can be shown that a 
probability in this sense can be ascribed to certain events, i.e. to certain types 
of experiments which in some way are connected with the sequence of basic 
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experiments. E.g. if the original sequence consists of the single successive 
tossings of a die, the derived sequence may consist of pairs of tossings with the 
sum of the outcoming points as new value of a. The new probabilities pi are 
found as measures of certain sets in the original measure system established in Q. 

There is no doubt that the model used by Professor Doob for representing 
empirical sequences of uniform experiments is logically consistent. Its practical 
usefulness depends on how the usual problems of combining different kollektivs 
and so on can be settled within this scheme. This has to be shown in detail. 
It seems to me that my conception is simpler in its application and closer to 
reality, while his model may be considered more satisfactory from a logical 
standpoint since it avoids the difficulties connected with the concept of ^‘all 
place selections.*^ At any rate, however, there is no contradiction or irrecon- 
cilable contrast: both theories are essentially statistical or frequency theories, 
equally far from the classical conception based on ‘‘equally likely cases.** In 
both theories probabilities are, of course, measures of sets. 

2. Comments by J. L. Doob. It is i)erhaps unfortunate that Professor von 
Mises* treatment of probability problems, based on typical sequences (“collec- 
tives,** “admissible numbers**), is commonly called the “frequency theory.**^ It 
is clear to any reader of our papers (identified as M and D below) that the idea 
of frequency, at least in the discussion of the relation of mathematics to prac- 
tice, is no more fundamental to one approach than to the other. In one mathe- 
matical treatment frequency notions first appear in the theorems, whereas in 
the other they first appear in the axioms ; but they appear in both . The principal 
objection the measure advocates have to the frequency approach is that it is 
awkward mathematically. Anyone who doubts this awkwardness need only 
examine various books published recently, using this approach, to see what a 
lot of fussy detail is involved merely in proving such elementary results as the 
Tchebycheff inequality or the Bernoulli theorem. One author considers it neces- 
sary to have his chance variables so restricted that if a: is a chance variable, the 
event x < k has a probability assigned to it only if A is not in some exceptional 
set, which may be infinite. To take another example, consider the coin tossing 
game discussed in both M and D, in which two out of three wins at tosses win 
a game. Apparently the probability analysis of this game is somewhat diflScult 
in terms of the frequency theory. As the quite eleinentary treatment outlined 
in D shows, there is no difficulty involved, using the measure approach. The 
question is simple: a set of chance variables is given (corresponding to the 
original tosses); a new set is determined from them (corresponding to the 
grouping into games). Only elementary algebraic manipulation is required to 
verify that the new chance variables are mutually independent in the mathe- 
matical sense, (Cf. D), and have the same distribution, so the law of large 
numbers is applicable. Professor von Mises considers that the measure theory 
cannot handle this problem, I on the other hand consider that this problem 
exhibits the mathematical disadvantages of the frequency theory. 

^ This identifriug name will be used below also. 



D1BCU8BI0N ON PROBABILITY THEORY 


217 


The frequency theory reduces everything to the study of sequences of mutually 
independent chance variables^ having a common distribution. 'Trobability 
theory is the study of the transformations of admissible numbers'' writes Pro- 
fessor von Mises. This point of view is extremely narrow. Many problems of 
probability, say those involved in time series, can only be reduced in a most 
artificial way to the study of a sequence of mutually independent chance vari- 
ables, and the actual study is not heljjed by this reduction, which is merely a 
tour de force. 

It is claimed in M that the axioms of measure theory only describe the distri- 
bution within one collective (M, p. 00). This statement seems to mean that 
only the measure relations (using the notation of D) of the first coordinate 
function Xi(o)) can be discussed in the measure theory, that is only probabilities 
of the type: the probability that Xi < k (in the language of practice, ‘‘the 
probability that the result of the first experiment is less than &") are discussed. 
Actually, however, (Cf. D) the measure theory can discuss any number of ex- 
periments simultaneously, using the appropriate space Q. 

Many of the debates between the advocates of the various probability theories 
have been wasted, l^ecause some of the debaters talk mathematics, others physics. 
With this in mind, I should like to stress again^ that (except for a few' philo- 
sophically inclined Englishmen) ever>’'one calculates probability numbers in the 
same way — a combination of reasoning based on experience and helped by 
theory, with examination of the experimental conditions and the results of trials. 
Frequency coasiderations ne(^(^ssarily play a large part. The fact that almost 
everyone calculates probability nuinbei’s in the same way does not alter the 
fact that one mathematical theory may be more useful or convenient than 
another in dealing with these probability numbers. 

In closing, it seems proper to call attention to what the measure advocates 
consider the real services and contributions of the approach of Professor von 
Mises. Professor von Mises was the first to stress the importance of the second 
of two fundamental generalizations of experience in dealing with repeated mu- 
tually independent experiments of the same character: (1) the clustering of 
success ratios and (2) the fact that this clustering is unaffected by a system of 
rejection as described in M and D. These two generalizations of experience are 
certainly fundamental. The only point under discussion here is how such gen- 
eralizations are to be put into a mathematical setting. The original such setting 
of Professor von MLses was criticized as not really mathematical. The setting 
now proposed by Copeland and others is criticized by the measure advocates as 
mathematically inflexible and clumsy. But it is significant that even in a treat- 
ment of the measure approach, as in D, it was felt essential to stress the mathe- 
matical interpretation of the two empirical generalizations of Professor von 
Mises. In the terminology of D, the measure advocates consider the contribu- 
tion of Professor von Mises' approach to be a contribution to a solution of 
Problem II, not to Problem I, the mathematical problem. 

* We are not talking mathematics now, but the application of mathematics. 
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1. Bitioductkm. Existing literature on the problem of calculating the in- 
complete Beta function 

(1.1) B,(p, Q) - a^'(l - dx, 0 < * < 1, p > 0, g > 0, 

and the levels of signihcance of Fisher’s z [1] leave further work to be done. 
Mailer’s continued fraction and a new continued fraction are shown to possess 
complementary features covering the range of B,(p, q) for all values of x, p, q. 
Previous methods of computing /*(p, q) — Bt(p, q)/B(p, q) are given in [2], [6], 
[6], [8], [10], [13], [14], [15]. 

Mailer’s continued fraction is 

(l.» ...], 

where 


JlP +iL a;P(l _ 


bi = 1, 


r(p -h i)r(g) " ’ ■" p-i-s’ 

r (p -H 8 - l)(p + a) X 

(p + 28 — 2)(p -t- 28 — 1) *** i — 

«(p + g + «) * 




(p + 28 — l)(p -I- 28) 1 - *■ 


A conveif^ent infinite series 1 + Z) can be converted into an infinite oon> 

n■»l 

turned fraction of the form Pr Pt where (4], [9] p, 304, 

1 + 1 + 1 + 


(1.3) 


Cl = 



fitful ^ 


— a » 

PU^IPU 


« > 2 


' Presented at a meeting of the American Mathematical Society, October 28, 1939, New 
York City. 
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where 
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di 

dt ... 

d. 


d\ 

dt 
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• • • d*+i 


di 

• 

• 

dt 

ds • • • 

d*+i 

, At+i “ 

dt 

dt 

di 

• • • d«+i 


i 

d#+i 

• • • 

du 


dt+i 

d«44 

dt^ 

• • • dju+i 


fiu ^ Of ft.+i 9^ 0. 

The infinite continued fraction found in this manner is called the corresponding 
continued fraction and the power series is said to be semi-normal if Ae* 9^ 0, 
Aa+l ^ 0* 


2. A new continued fraction. MUller found his continued fraction by con- 
verting in the manner of the preceding paragraph 


hip, q) 

( 2 . 1 ) 


rip + q)z^il - 

r(p > I)r(g) 

C v (g - l)(g - 2) . . • (9 - r - 1) / * 

' ■^,t 3 ;(pH-l)(p + 2 )...(p + r+l)Vl-x/ /’ 

* < §• 


We convert 


hip, q) 

( 2 . 2 ) 


r(p + g)x’’(l - x)« 
r(p + i)r(g) 

.ii j- V (P g)(P + g + 1) • * • (p + g + r) _H-i\ 
\ r-0 (p + l)(p + 2) • . . (p + r + 1) /’ 

0 < a: < 1. 


Consequently 

fl_P + 9 a_(p + ?)(! “ 9) 

f, _ (P + 9)(P + 9 + 1) ••• (P + 9 + « - 1)(P + g + «) . 

(P + 8+1)(P-|-* + 2)... (p + 2«)(p + 2« + l) 

ff _ (1 - 9)(2 - g) • • • (« - g)(« + 1 - g)(« + DL 
(P + 1)(P + 2 )...(p + 2«+1)(p + 28 + 2) 

(p + 8)(p + g + «) _ 8(g - «) 

(p + 28)(p + 2«+l)’ ** (P + 28- 1 )(p + 28)’ 


r(p + g)a^(l - *)•/ 1 Cl Cl \ 
r(p+i)r(g) \n-i+i+”7’ 


and 

(2.3) 


hip, 9) 
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where (7. = c^x. By well known theorems due to Van Vleck [12] and Perron 
[9] p. 347 we find (1.2) converges for —1 < x < and (2.3) converges for 
— 00 < X < 1, and in the neighborhood of zero (2.2) equals (2.3). The region 
of equivalence of the series and the fraction may be extended by the following 
argument. Let the infinite series be terminated at some arbitrary" point which 
gives the desired accuracy. Then the continued fraction of the corresponding 
type represents this finite series, is finite and gives the result within the desired 
accuracy. The new continued fraction may also be derived by use of the hyper- 
geometric series [9] p. 348, A special case of (2.3) was given by Markoff [3], 
pp. 135-41, [11] pp. 53-55, who applied the result only to the binomial distribu- 
tion. The associated continued fraction provides more rapid convergence than 
the corresponding continued fraction. The associated continued fraction is 
found by means of the hypergeometric series [9] p. 331, p. 348: 

r (n /A = r (p + q)3^(.i - x )"! hx hx^ . 

* ’ r(p + i)r(g) \ 1 H- Zix+ 1 + /*x+ 1 + i»x-\r 


kl = 


P + 9 




(2.4) 


P + 2 


fca+1 — 


L+1 = 


(p + 2« 



(p *+ 2«)(p + 2« + 1) 


p+ r 

g (g - g)(p + 9 + «)_ 

ij(p + 28np + 28 + 1 )» 

- (p_+_« + I)(p_'+L^ 

(p+*2r+ 2)(p + 2s+ if ’ 


8^1. 


The disadvantage of (2.4) lies in the unwieldy form of computation. For prop- 
erties of an associated continued fraction and the corresponding continued 
fraction in connection with convergence and the Taylor series reference is made 
to [9] p. 331 and pp. 302-303. 


3. Properties of the corresponding continued fraction. Muller and Soper 
[5], [10], pointed out the inadvisability of integration through the mode x = 

— ^ . In such cases we change Jx(p, q) to h^xiq, p). Miiller has shown 

p + g — 2 

for his continued fraction that if we do not integrate through the mode (we 
assume this in the remainder of the paragraph) that convergent^ 2, 3, 6, 7, etc., 
will be greater than the true value and the remaining convergents will be less 
than the true value provided q is an integer. However, if g is not an integer, 
and is small (q < 20), it may happen that all convergents are above the true 
value. In such cases we may consider whether Mtiller's continued fraction may 
apply by estimating the remainder /(p + «, g — «), after s reductions by parts 
[ 10 ]. 

' For the new continued fraction also 

- «) p - 1 

(p + 2« - l)(p + 2«) p + g - 2 ’ 

(p + *)(p + g + <)(p - 1) ^ . 

(p + 2«)(p + 2« + l)(p + g — 2) ’ 
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and Cu+\ < 0; Ci, > 0 unless a > q when C», <0. If Ct, > 0 then the con- 
vergents 2, 3, 6, 7, 10, 11, etc., will be above the true value and the other con- 
vergents will be below the true value. If Ca« < 0, then all convergents will be 
above the true value. In such cases, since a remainder for the continued frac- 
tion has not been found, it seems best to estimate Jx(p -|- s, g — s) to obtain 
an idea of the error. 


i. /.(p 4- 9 — s) and the equivalent continued fraction. Soper [10] has 

given the remainder after « reductions by raising p. This will furnish an upper 
bound of the error in the corresponding continued fraction after s convergents. 
The remainder, when g — « is a negative integer, is approximately 


(4.1) 


/.(p -H «, 9 - «) 


2 sin (9 - - ll/g^Cp + 9) 



where ( = ^ ^ * . 

P + 9 


Another approach is to uhc the equivalent continued fraction, for « — 1 con- 
vergents of the equivalent continued fraction reproducejs exactly 8 terms of the 
infinite series. The infinite series and the equivalent continued fraction for the 
infinite series are alike in all respects except form. By [9] p. 210, we find that 
the equivalent continued fraction for (2.3) is 


Wi = 


7i 


72 


78 


74 


1 + 7i““ 1 + 72"“ 1 78”* 1 + 74“" 


where 


(4.2) 


and 


p + g 
P + 1 


_ P + ^ + 1 


.y, = P + 9 + 2 
p + 3 ’ 


_P + 9 + »'-l„' 

7r jj Xy 

p + r 


Ixip, g) 


Tip -1- 9)x’’(l — x)® 1 

r(p + i)r(g) 1 - Wi' 


The equivalent continued fraction for Mtiller’s continued fraction is given in 
[5], p. 292. 


6. Numerical illustration. If and B, represent the numerator and the 


denominator of the r-th convergent of a continued fraction 


CLl Q>t Oa 


then 


( 6 . 1 ) 


.4. — b^A.^—1 4- UpAn—f 

Bv *= bpB^i + OtBp-a , 


V > 2. 



222 


LfiO A. ABOIAN 


As an example we calculate (2.6, 1.6), which could not be done by MtUler’s 
continued fraction. 


Convergent 

A 

B 

A/B 

1 

1 

1 

1 

2 

1 

.42857143 

2.3333333 

3 

1.015873016 

.44444444 

2.2857142 

4 

.66233767 

.29292929 

2.2610838 

5 

.64812966 

.28671329 

2.2605498 

6 

.46471308 

.20559441 

2.2603391 

7 

.441837914 

.195475117 

2.2603281 

8 

.33105492 

. 14646345 

2.2603245 

9 

.30890766 

. 13666520 

2.2603242 

10 

.23762461 

.10512856 

2.2603240 

11 

.21882154 

.096809808 

2.2603240 


Using the value of the eleventh convergent we have, 7.6 (2.5, 1.5) = .28779339. 
Pearson [7], p. 30, gives .2877934 and Soper [10], p. 32 gives .28779341. 

6. Discussionof the various methods. Muller's continued fraction encounters 
difficulties when q is small due to the possible divergence of the series on which 
it is based. In such cases the new continued fraction works admirably. Where 
‘‘reduction by parts" [10] is advisable it would seem Muller's results will be 
better, while if “integration raising p" is preferable, then the new continued 
fraction would be necessary. The other methods suggested in the past lacked 
in some cases remainder terms; were in other cases too long; were feasible only 
in a limited range; or were only approximations. I am particularly indebted 
tiO Professor C. C. Craig under whose guidance this study was completed. 
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NOTES 

This section is devoted to brief research and expository articles^ notes on methodology 
and other short items. 


NOTE ON THE DISTRIBUTION OP NON-CENTRAL t 
WITH AN APPLICATION 


By Cecil C. Craig 
University of Michigan 


If we adopt the notation recently used by N. L. Johnson and B. L. Welch [1], 
non-central t is defined by 


t = 


2? “j- 5 

y/w * 


in which 3 is a constant and z and w are independent variables, z being distributed 
normally about zero with unit variance and w being distributed as x^/f in which / 
is the number of degrees of freedom for x^* 

In the paper referred to Johnson and Welch discuss some applications of 
non-central t and give suitable tables calculated from the probability integral 
of the distribution of this variable. Previously tables of this probability in- 
tegral for the purpose of calculating the power of the t test had been given by 
J. Neyman [2] and Neyman and B. Tokarska [3]. 

It is the purpose of this note to call attention to a series expansion for the 
probability integral of non-central t which is simple in form and in most cases 
convenient for direct calculation. As an application of some intrinsic interest 
this series is used to compute in several numerical cases the power of a test 
proposed by E. J. G. Pitman [4] based on the randomization principle. 

If for convenience we write, 

y/w (0 < ^ < »), 

we have for the joint distribution ofz + 6 and 

(1) «. + «,«- 

From this 


( 2 ) 




V^T(f/2) 




2(//2y«e-«*« A 

r(//2) r--o 
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m 


Now this series can be integrated term by term with respect to ^ over its range 
and we have, 


(3) 


mt) 


(//2)//«e-»‘/t rli(/+ r + 1)] 

V^r(//2)« r! 


(«) 


’ 0 - 




) 


This series converges uniformly in any finite interval for t and it may be inte- 
grated term by term over the entire range for t or over any part of it. In 
particular, after some reduction, we get. 


(4) 


p(o<t<io If, s) = r 4f(i) 

Jo 


-»*/» • 




in which /^(r -h l)/2, //2; is the incomplete Beta-fimction in the nota- 

tion of Karl Pearson. Often what is wanted is 


(6) P{-to<t<k)^ I ((r + l)/2,//2;^-^). 

Since the incomplete Beta-function is numerically less than unity it is seen 
that the series (4) or (6) converges rapidly for moderate values of 6 such as will 
ordinarily occur in applications for small samples. The use of Pearson's tables 
of /(p, q; x) will be convenient since interpolation will be required for only one 
of the three arguments. 

As an application let us consider the test proposed by Pitman in the paper 
referred to above. Two independent samples, Xi , X2 , • • • , gcsi , and yi yVi, • • • , 
j/jv, , have been drawn and it is desired in the absence of any information about 
the two populations from which the samples came to test the hypothesis that 
they have equal means. A test based on what may be termed the principle of 
randomization for this situation has been discussed by R. A. Fisher [5] and by 
E. S. Pearson [6]. It is as follows: Let the combined sample of Ni + ob- 
servations be separated into sets of Ni observations, , • • • , , and 

observations, i;i , V2 , • • • , yjv, , in all possible ways. For each such separation 
let the numerical difference of the means, | iZ — v |, be the spread. Then for a 
suitably chosen 5 > 0, we will reject the hypothesis of equal means if fewer than 
100a% of the ni^n^Cki spreads exceed 1 x — ^ |, and otherw^ise not. It is clear 
that this test is fiducially valid independently of the populations actually sampled 
in the sense that if it be consistently followed for all such samples, the proportion 
of cases when the h3npothesi8 is rejected when it is true will statistically ap- 
proach a. 

For all but very small samples it is very tedious to calculate the ni-^-n^Cni 
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spreads and Pitman in his discussion shows that for quite moderate values of 
Ni and Nt the quantity, 


w — 


NiNt j.'j* 


>3 


s(» - £)* + s(y - 0 . NiNt 


Nx + Nt 


+ 


{Ni + N,y 


(fl - v) 




f + f 


has a distribution which in all but very exceptional cases is quite well approxi- 
mated by a B(§, i{Ni + AT* — 2))-function. That is, the distribution of w for 
the w,+j»,Cw, spreads may for practical purposes be found from that of t, by a 
simple transformation, with iVi + JVj — 2 degrees of freedom. 

It seems pertinent to make some inquiry into the power of such a test, that is, 
to make an attempt to learn something about the probability that such a test 
will fail to reject the hypothesis of equal means when it is in fact false. To do 
this it is now necessary to specify the populations which have actually been 
sampled. If we suppose that these populations are normal with equal variances 
but with unequal means which, with no loss of generality, may be taken to be /t 
and —It respectively, the probability integral of the distribution of non-central 
t will give our answer. 

If we set 


we have 

Also, 


t = v?r/{. 


^ _ (iVi - Ds! -I- (AT, - Del ATx + AT, - 2 _ / 

^ A \T n * Ar I Ar i 


JVi H- AT, - 2 


Ni + Nt 


Ni + Nt 




in which s’ is the usual estimate of the population variance a* based on f = 
Ni + N» — 2 degrees of freedom. Then 


= J A 

« r ATx 


ATxiV, 


+ Ar, 


and this is a central < if m = — M = 0, otherwise it is non-central, 
case we write (the test is made on x — ^), 


In the latter 


t 


(f — )u) — (9 + m) + 2/1 


ATx -HAT, 


« -f- « 
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in which, 


and 


(f — m) — (1? + m) 




mNt 
+ A^i' 


V' “ «A. 


2 m ./ JVxJV, 

<r r JVi + JV,- 


In appl 3 dng Pitman’s test for a given significance level a, one determines 
whether or not 

F(w > ico) ^ o, 


Wo being the value of w calculated from the sample. This is equivalent to finding 

P(e > <0*), 

for the proper /, in which 



and this can be found from an ordinary table of the probability integral of the 
t-distribution. 

For a numerical example let Ni ^ Nt = 10 so that/ = 18. If we adopt a 5% 
significance level we have tl == 2.101® for the critical value. Let us suppose that 
n/ar = 0.1, and calculate the probability that the hypothesis that m = 0 will be 
rejected. We have J = 0.1 and 


Then 


jL 


0.1969. 


P(f < ® [7(0.5, 9; 0.1969) + 0.1 7(1.5, 9; 0.1969) 


+ 


0.01 

2! 


7(2.5, 9; 0.1969) + • • v 


= 0.9292. 


Four terms of the series were enough to give this result. The probability of 
rejecting the hypothesis in this case is thus 0.0708. 

The following tables show results for a = 0.05 and 0.01, /i/a — 0.1, 0.2, and 
0.5, and = W* = 10 and 20. 
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Values of P{f > 
JVi a= iV '2 = 10 


Of \ 

0.1 

0.2 

0.6 

0.05 

0.0708 

0.1355 

0.5621 

0.01 

0.0165 

0.0396 

0.2940 


JNTi = ATj = 20 


\ mA 


j 


.\ 

0.1 

0.2 

0.6 

0.05 

0.0947 

0.2345 

0.8691 

0.01 

0.0251 

0.0862 

0.6730 


In only one case was it necessary to calculate as many as ten terms of the 
corresponding series to obtain these values. 
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NOTE ON AN APPUCATION OF RUNS TO QUALITY CONTROL CHARTS 

By Frederick Mobteller 
Princeton University 

In the application of statistical methods to quality control work, a customary 
procedure is to construct a control chart with control limits spaced about the 
mean such that under conditions of statistical control, or random sampling, the 
probability of an observation falling outside these limits is a given a (e.g., .05). 
The occurrence of a point outside these limits is taken as an indication of the 
presence of assignable causes of variation in the production line. Such a form 
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of chart haa been found to be of particular value in the detection of the pieeenoe 
of assignable causes of variability in the quality of manufactured product. As 
recently pointed out, however, the statistician may not only help to detect the 
presence of assignable causes, but also help to discover the causes themselves in 
the course of further research and development. For this purpose, runs of 
different kinds and of different lengths have been found useful by industrial 
statisticians/ Quality control engineers have found, at least in research and 
development work, that a convenient indication of lack of control is the occur- 
rence of long runs of observations whose values lie above or below that of the 
median of the sample. For example (as will be shown below), at least one suc- 
cession of 9 or more observations above or below the median in a sample of 40 
would be taken as evidence of lack of control at the .06 level; meaning that 
under conditions of control such a run would occur in approximately 6 per cent 
of the samples. Since this type of test has been found useful by quality control 
engineers, it is perhaps desirable to discuss the mathematical basis of such tests 
of control and provide a brief table for samples of various sizes at the signifi- 
cance levels .05 and .01. 

The general distribution theory of runs of k kinds of elements, and in particular 
that of two kinds has been thoroughly investigated by A. M. Mood.* The 
purpose of this note is to give an application of the general method to quality 
control. 

Let us consider a sample of size 2n drawn from a continuous distribution 
function /(a:). These are then arranged in the order in which they were drawn. 
We now separate the sample into two sets by considering the nth and (n + l)8t 
elements in order of magnitude, then if g Xn , will be called an a, and if 
Xi ^ Xn+i , Xi will be called a 6. A run of a^s will be defined as usual as a suc- 
cession of a's terminated at each end by the occurrence of a 6 (with the obvious 
exceptions where the run includes the first or last element of the sample), and 


1 The use of ‘‘runs and ‘‘runs down*’ as well as runs above and below the arithmetic 
mean of a sample were briefly described in a paper by W. A. Shewhart, “Contribution of 
statistics to the science of engineering/* before the Bicentennial Celebration of the Uni- 
versity of Pennsylvania, September 17, 1940, to be published in the proceedings of that 
meeting. In a paper, “Mathematical statistics in mass production,** presented before the 
American Mathematical Society in February, 1941, Shewhart discussed some of the ad- 
vantages of using runs above and below the median and showed how by comparing runs of 
different types in a given problem it is often possible to fix rather definitely the source of 
trouble. The present note considers only the frequency of occurrence of “long** runs which 
are often used by research and development engineers to indicate the presence of assignable 
causes of variation. The occurrence of more than one such run in a given sequence, if dis- 
tributed above and below the median value may also constitute valid evidence of the 
presence of more than one state of statistical control between which the phenomena may 
oscillate. The interpretation of long runs in this sense, however, is not considered in the 
present note. 

• A. M. Mood, “The distribution theory of runs,** Annals of Math, Stat,f Vol. 11 (1940), 
pp. 367-392. 
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runs of Vs are defined simlarly. A run of a’s may conveniently be called a 
run “below the median,” and a run of Vs a run “above the median.” 

We shall use Mood’s notation throughout, i.e,, ru , rjf , (t = 1, 2, • • • , n) are 
the number of runs of a’s and Vs respectively of length t, and ri , are the total 


number of runs of a’s and Vs; 



will indicate a multinomial coefficient, and 


a binomial coefficient. 


Also we define 


Fin , r*) = 0, I n - r, I > 1, 

Fin , rj) = 1, I ri - r* 1 * 1, 

Fin , r*) = 2, 1 n - r* 1 = 0. 


Then the distribution of runs of a’s for our case is 


n + l\ 



We would like to find the probability of at least one run of s or more o’s. The 
coefficient of x” in 

(2) [* + I* + • • • + a;* 


gives the number of ways of partitioning n elements into n partitions such that 
no partition contains « or more elements, and none is void. Rewriting (2) we 
have 





and the coefficient of x” is just 


(3) 





Then the probability that we desire, of getting at le^t one run of s or more a’s 
is immediately given by 


Piru 1, t ^ «) 




- 1 - jis - 1) 


)](”r) 


(») 
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Noting that whrai y « 0 in the inner summation we have just the total number 
of partitions, we get finally 


(4) P(ru S 1, * ^ «) 


«-«+i 





A similar result of course holds for the b’s. 

If we desire the probability of getting at least one run of s or more of either 
a’s or 5’s, we compute the probability of getting no runs of this type and sub- 
tract from unity. Expression (3) multiplied by the total number of ways of 
getting no partitions of s or more 6’s for a given n , and then summed on ri 
gives exactly the number of ways of getting no runs of either a’s or b’s as great 
as 8. This is 


(5) 




and the probability desired is 

(6) P(ru ^ 1 or rji- ^ 1 or both; i ^ = 1 — A 


In spite of the complex appearance of A, the sum can be rapidly calculated for 
any given n since the calculations for the sums on i andj need not be duplicated. 

In the case of a quality control chart, we set a significance level a for a given n, 
this determines 8 the length of run of either type necessary for significance at the 
level chosen. Suppose we are interested only in runs occurring on one side of 
the median, say above, when a = .05, n = 20 (i.e., sample size equal to 40). 
We determine the least value of 8 which will make the right hand side of equa- 
tion (4) less than or equal to .05, It turns out that s = 8 for this case. This 
means that under conditions of statistical control, i.e., random sampling, one or 
more runs of length 8 or more, above the median will occur in approximately 
5 per cent of samples of size 40. Naturally an identical result holds when we 
are considering only runs below the median. 

On the other hand, if under the same conditions as given above (n = 20, 
a = .05), we are using as our criterion of statistical control the occurrence of 
runs of length 8 or greater either above or below the median, we must determine 

the least value of 8 such that 1 — A/^^^ < .05. This value turns out to be 9. 

In other words under conditions of statistical control at least one run of at least 9 
will occur either above or below the median in less than 5 per cent of the cases 
on the average. 
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The following table gives smallest lengths of runs for .05 and «01 significance 
levels for samples of size 10, 20, 30, 40, 50. 



Runs on one side of median 

Runs on either side of median 

2n 

a - .05 

a ■■ .01 

a - .05 

a - .01 

10 

6 

— 

5 

— 

20 

7 

8 

7 

8 

30 

8 

9 

8 

9 

40 

8 

9 

9 

10 

50 

8 


10 

11 


If there is an odd number of individuals, say 2n + 1 , in the sample, we would 
choose the value of the median as the dividing line for our sample and treat the 
data as if there were only 2n cases, thus ignoring the median completel3^ 

The following table* gives the probabilities of getting at least one run of b 
or more on one side, either side, and each side of the median for samples of size 10, 
20, and 40. 


Length 

of 

Run (s) 

One 

Side 

2n 10 
Either 
Side 

Each 

Side 

One 

Side 

2n > 20 
Either 
Side 

Each 

Side 

One 

Side 

2n - 40 
Either 
Side 

Each 

Side 

1 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

2 

.976 

.992 

.960 

1.000 

1.000 

1.000 

1.000 

1.000 

1.000 

3 

.500 

.667 

.333 

.870 

.956 

.784 

.992 

.999 

.986 

4 

.143 

.230 

.056 

.457 

.640 

.274 

.799 

.930 

.668 

5 

.024 

.040 

.008 

.178 

.293 

.064 

.450 

.650 

.249 

6 




.060 

.106 

.013 

.207 

.346 

.068 

7 




.017 

.032 

.002 

.087 

.158 

.016 

8 




.004 

.007 

.000 

.034 

.065 

.004 

9 




.001 

.001 

.000 

.013 

.025 

.001 

10 




.000 

.000 

.000 

.005 

.009 

.000 

11 







.002 

.003 

.000 

12 







.000 

.001 

.000 

13 







.000 

.000 

.000 


One method of computing such a table is to use expression (4) to obtain the 
probabilities on one side, and to use (6) to get probabilities for either side. 
Then the probabilities for runs on each side may be computed by using the 
relationship 

2P (one side) — P (either side) = P (each side). 

■The author is indebted to Dr. P. S. Olmstead of the Bell Telephone Laboratories 
for kindly placing this table at his disposal. Dr. Olmstead has pointed out that these 
probabilities have been found very useful in research and development work. 
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TEST OF HOMOGENEITY FOR NORMAL POPULATIONS 


By G. a. Baker 
University of Calif omia 

1. IntroductioiL In biological experiments it is often of interest to test 
whether or not all the subjects can be regarded as coming from the same nomia) 
population. If they have not come from the same normal population, usually 
the most plausible alternative is that the subjects have come from a population 
which is the combination of two or more normal populations combined in some 
proportions. The combination of normal populations is a '‘smooth^' alternative 
to the hypothesis of a single normal population. Such non-homogeneous popu- 
lations are not the only ^^smooth^' alternatives, of course, but are included 
. among the ‘^smooth’^ alternatives. If there is reason to believe that the only 
deviation from a normal population is due to non-homogeneity, then the results 
of Professor Neyman in his paper [1] are available in studying this problem. 

It is desirable not to make any hypotheses about the mean and standard 
deviation of the sampled population, but to base all computations and tests on 
the data contained in the sample. Such a viewpoint has been stressed in a 
previous paper [2] where it was shown that if the sampling is from a normal 
population, the probability of a deviation from the mean of a first sample of n 
measured in terms of the standard deviation of the sample is proportional to 


( 1 . 1 ) 


do 


{ 


1 + 


Y«- 

n+l) 


The result (1.1) and Neyraan’s results give rise to a test of homogeneity which 
is valid for “large” samples. Empirical results show that fairly conclusive evi- 
dence of non-homogeneity may be obtained with samples of 100. Samples of 50 
or less may be suggestive but rarely decisive. 


2. Development of Test Suppose that a sample of n + 1 is drawn from a 
normal population. It can be regarded as being made up of a first sample of n 
and a second sample of one. The value of v corresponding to (1.1) can then be 
computed and its distribution function is (1.1). This partition, of course, can 
be made in n + 1 w’ays. That is, n + 1 values of d are determined from a 
random sample of n + 1 from the original parent. It is true that these values 
of V are not independent among themselves. The correlation between the values 
of V, to a first approximation at least, is of the order of 1/n and can be neglected 
if n is “large.” 

A suitable transformation as discussed in [3], [1] and elsewhere, transforms 
(1.1) into a rectangular distribution. 

If the same computations are made when the sampled population is not 
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normal, then the resulting values obtained will not be rectangularly distributed. 
For instance, suppose that the sampled population is 


( 2 . 1 ) 




f(x) = -4= 


we find that the distribution of v based on the first sample of 2 is a very com- 
plicated expression involving sums of exponentials and definite integrals of expo- 
nentials. To obtain a rectangular distribution if the sampled population is 
normali the appropriate^ transformation to make is 

t; ss= — -^3 cot mt 

(2*2) . A- . . 

dv = V 3 V CSC ru du. 

The resulting u-distribution for population (2.1) then is to be compared with 
the rectangular distribution in the interval from zero to one. 

For ‘Uarge” values of n + 1 and for symmetrical non-homogeneous popula- 
tions composed of two normal components, the ii-distribution will be sym- 
metrical about less than one near the ends, greater than one for values 

of u moderately far from J and less than one for values of u near A Ne 5 nnan 
II] of order 4 will be necessary to detect a difference of this sort. If the 
non-homogeneous population of two components is skewed, the ti-distribution 
will still show the same two-humped effect but may be skewed instead of sym- 
metrical. A Neyman of order 4 should still be computed, although ^5 may 
be more significant. 

The test then consists of: 

(a) computing the n + 1 quantities 

(2.3) (»•= 1,2,3, ...,n+l) 

vw + Is 

where 


n 4- 1 = number in the sample 
Xi ~ the observed values 


xj = the observed values except Xi 


X 


1 ** 
n , -i 


- 2 (*»• - *)* 
n /-I 


(b) making the transformation 


£ 


yadx' 


(1 +*'*)"«’ 


(t - 1, 2, 3, . . . , n + 1) 


(c) computing the first four of Neyman’s paper [1] 

(d) comparing with '9lik) as found from the Incomplete Gamma Function 
Tables. 
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If n is large, say n = 100, then u is given approximately by the normal 
probability integral. 

If n is small, the values of u are obtained from the Table 25 of VoL 2 of 
Pearson's Tables. 

Neyman's derivation assumes that n + 1 is large and that the w's are inde- 
pendent. In this case, if n + 1 is large, then the w's are neariy independent, 
and hence the tost is valid. The same procedure can be applied for smaller 
samples. It can not be expected that small differences from normal in the 
sampled population can be detected with small samples. Empirical results 
indicate that samples of 100 are necessary for decisive results even when the 
differences of the sampled population from a normal homogeneous population 
are large. Samples of 50 may be suggestive and in very extreme cases might be 
decisive. 


TABLE I 


Empirical Sampling Remits 



k - 1 

fc — 2 

A; - 3 

Ac » 4 

^'J's for 51 from population A 

.0001 

.843 

2.009 

7.464 

^*’s for 101 from population A 

.086 

2.403 

4.998 

12.868 

^'fc's for 101 from population B 

.553 

.927 

7.472 

7.485 

'I'i’s for 101 from normal 

.017 

.082 

1.288 

1.663 

^(.o«(*)’s (Neyman [1]) 

3.842 

5.992 

7.815 

9.488 

'*'(.oi)Ws (Neyman [1]) 

6.635 

9.210 

11.345 

13.277 


It is to be noted that the test makes no assumption about the parameters of 
the sampled population and docs not group the data. The application of the 
test gives a unique result that does not depend on the judgment of the computer 
in any respect. In applying the usual chinsquare test the computer must choose 
groupings. The choice of groupings as indicated in [5] may change the P-values 
to very different levels of significance. 

3. Empirical results. Samples of 51 and 101 from }K)pulation A, of 101 from 
population B, and of 101 from a normal population, were drawn by throwing 
dice. Populations A and B are given in [4]. Population A is symmetrical and 
distinctly bimodal. Population B is weakly bimodal and strongly skewed. 

For samples from population A it is necessary to compute . For samples 
from population B it may be sufficient to compute . The non-homogeneity 
of the type of population A seems to be somewhat more detectable than of the 
type of population B, The sample from the normal parent shows close con- 
formity with expectation. 

In appl 3 dng the proposed test for homogeneity the u-values for small inde- 
pendent sets of data can be combined to give a much larger number of ti-values. 
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A NOTE ON THE POWER OF THE SIGN TEST 

By W. Mac Stewart 
University of Wisconsin 

1. Introduction. Let us consider a set of N non-zero differences, of which x 
are positive and N — x are negative; and suppose that the hypothesis tested, 
Ho , implies, in independent sampling, that x will be distributed about an ex- 
pected value of iV '/2 in accordance with the binomial (i + As a quick 
test of Ho , we may choose to test the hypothesis ho that x has the above proba- 
bility distribution. Defining r to be the smaller of x and N — Xj the test con- 
sists in rejecting ^ and therefore Ho whenever r ^ r(€, N), where r(€, N) is 
determined by N and the significance level c. 

2. Power of a test. In applying such a test it is of interest to know how 
frequently it will lead to a rejection of Ho when Ho is false and the situation H 
implies that the probability law of a; is (g + p)^, with p ^ i, thereby indicating 
an expectation of an unequal number of + and differences. The proba- 
bility of rejecting Hq when Hi implying p — pi is true, is termed the power of 
the test of Ho relative to the alternative Hi Thus, from the point of view of 
experimental design the power (P) of the test of Ho may be considered a func- 
tion of the alternative hypothesis Hi , the significance level c, and N, As such, 
the following observations may be noted: 

1 . The power P2 , for an assumed c, and H2 implying p = p2 is greater 
than or equal to the power Pi for €, N and Hi implying p == Pi where 
1 P2 - .50 1 > I Pi - .50 |. 

^ For an extensive discuanon of the power of a test, the reader is referred to J. Ney- 
man and E. S. Pearson, Statistical Research Memoirs, Vol. 1 (1936), pp. 3-6. 
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2. The power Pa for an aasumed Hi ^ N, and ea » is greater than or equal to the 
power Pi for Hi , N, and «i , where ^a > «i . 

3. The power Pa for an assumed Hi , €, and N% is greater than or equal to the 
power Pi for Hi , e, and JV^i where Nt > Ni, 

Hence^ to increase the power of the test of He relative to a particular Hi , 
the methods implied in observations 2 and/or 3 may be employed. However, 
if any increase in an established e is undesirable, the method implied in observa- 
tion 3 is the alternative. 

8. Explanation of table. In the interests of efficiency and economy, two 
questions then arise: (1) What is the minimum value of JV, which, at the signih- 
cance level €, will give the test of He a power P > /?, relative to a particular 
alternative hypothesis Hi ? (2) For this minimum value of N corresponding 
to €, what is the maximum value of r? Stated in another manner, the questions 
are these: “What is the smallest number (min N) of paired samples that must 
be employed in conjunction with the Sign Test in order that the test of He , 
at the significance level €, shall have a power P > relative to an alternative 
hypothesis Hi T* (2) If x of these paired samples give rise to a positive dififer- 
ence, and (min N — x) a negative difference, and if r be defined as the smaller 
of X and (min N — x); then, what is the maximum value that r may attain and 
still have the results, at the level e, judged significant? 

Table I provides the answers to these questions for the significance level 
€ S .05; and (1) for Hi implies p — pi for values of pi from ,60 to .96 (and 
thereby from .40 to .05) at intervals of .05; (2) for values of from .05 to .95 
at intervals of .05, and also for fi > .99. For example, assume that a power 
P > .80 rdative to the alternative hypothesis Hz {pi = .70) is desired. In 
Table I, the entry appearing in the column headed Hz (pi = .70), and in the 
row P > .80 is 49,17 — indicating that 49 paired samples are required, of which 
17 or less must be of one sign (+ or — ) and hence 32 or more must be of the 
opposite sign in order that the results be significant at the .05 level. 

Because of the discreteness of the binomial distribution, it is impossible to 
maintain the level of significance at .05 or even "arbitrarily close to that figure 
and still hold to the criterion that N shall be at a minimum. For that reason, 
particularly when min N is small, results significant at .05 according to Table I 
may be significant at a level e' where «' is considerably less than .05. In general, 
however, and in particular when min N is large (greater than 60) both the 
quantities (.05 — e') and (P — fi) are small. 

4. Illustrative example. Goulden^ describes a simple experiment in identi- 
f3dng varieties of wheat. In this experiment, a wheat “expert” is presented 
paired grain samples of two particular varieties of wheat. The object of the 

*C. H. Qoulden, Methodn of Btalutical AnalysiM, John Wiley and Sons, New York, 1980, 

p. 2. 
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experiment is to test the ability of the expert to differentiate between the two 
varieties by arranging the pairs so that samples of one variety are on the left, 
say, and samples of the other variety are on the right. 

In a problem of this type, it is desirable to have a sufficiently large number, N, 
of paired samites in order that the following conditions be fulfilled: (1) The 
probability that a person possessing no discriminating ability pass the test 

TABLE I 

Minimum number of paired samples and maximum values of related r 

Ho Po ~ .50 


(5% level of significance, i.e., « < .05) 
(min N, max r) 


POWIIB 


H, 

Pi •'.96 

Hj 

pi *.90 

pi *.86 

Pi-.SO 

H, 

p,-.76 

Ht 

P1-.70 

Ht 

PI-.66 

m 

Pi“.60 

0 < P < 

.05 

— 

— 

— 

— 

— 

— 

7,0 

6,0 

P > 

.05 

— 

— 

— 

— 

— 

7,0 

6,0 

9,1 

P > 

.10 

— 

— 

— 

— 

7,0 

6,0 

9,1 

17,4 

P > 

.15 

— 

— 

— 

8,0 

6,0 

9,1 

12,2 

25,7 

P > 

.20 

— 

— 

— 

7,0 

10,1 

13,2 

17,4 

37,12 

P > 

.25 

— • 

... 

8,0 

6,0 

14,2 

12,2 

23,6 

44,15 

P > 

.30 

— 

— 

7,0 

11,1 

9,1 

18,4 

25,7 

56,20 

P > 

.36 

- 

— 

6,0 

10,1 

12,2 

17,4 

30,9 

65,24 

P > 

.40 

— 

8,0 

— 

9,1 

16,3 

20,5 

35,11 

74,28 

P > 

.45 


7,0 

11,1 

— 

15,3 

26,7 

42,14 

89,35 

P > 

.50 

— 

6,0 

10,1 

13,2 

18,4 

25,7 

44,15 

101 ,40 

P > 

.55 

— 

— 

9,1 

12,2 

17,4 

30,9 

51,18 

112,45 

P > 

.60 

— 

— 

14,2 

15,3 

20,5 

36,11 

56,20 

125,51 

P > 

.65 

7,0 

11,1 

13,2 

19,4 

23,6 

35,11 

63,23 

143,59 

P > 

.70 

6,0 

10,1 

12,2 

18,4 

25,7 

40,13 

67,26 

158,66 

P > 

.75 


9,1 

16,3 

17,4 

28,8 

44,15 

79,30 

175,74 

P > 

.80 

— 

14,2 

15,3 

20,5 

30,9 

49,17 

90,35 

199,85 

P > 

.85 

11,1 

12,2 

18,4 

25,7 

35,11 

56,20 

101 ,40 

227,98 

P > 

.90 

9,1 

15,3 

17,4 

28,8 

42,14 

65,24 

114,46 

263,115 

P > 

.95 

12,2 

17,4 

23,6 

35,11 

49,17 

79,30 

143,59 

327,146 

P > 

.99 

15,3 

23,6 

30,9 

44,15 

67,25 

110,44 

199,86 

453,205 


through sheer guesswork be less than t; and (2) if past experience has proven 
that an expert does possess the ability to discriminate between the varieties to 
the extent of placing a proportion, pi , of the pairs correctly in the long run, 
then the probability that he will pass the test be P. 

Under these conditions, how large an AT is required, and for that AT, what is 
the maximum number of pairs that may be incorrectly placed without failing 
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the test? For alternative hypothesis Ha (pi =?* .76), and for P > .90, referring 
to Table I, it is seen that 42 paired samples must be, employed and not more 
than 14 may be placed incorrectly. Under the same alternative h3rpotbe8is, if 
it be required merely that P > .60 (i.e., an expert with an ability of .75 have 
better than an even chance of passing), then only 18 paired samples are necessary 
and not more than 4 may be arranged incorrectly. 

Thus, before conducting an experiment in which the Sign Test is to be em- 
ployed, if the experimenter fii’st decides what power the test must have relative 
to a certain alternative hypothesis; then from the accompanying table he may 
learn the minimum number of paired samples that are necessary; and the related 
maximum value of r. 

If this procedure is not followed, and an experimenter employs, say 6 paired 
samples, he may (as can be seen from the table) discover, to his dismay, that 
^‘experts'' of ability .75 will be unrecognized more than 80% of the time. 


MOMENTS OF THE RATIO OF THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE MEAN SQUARE DIFFERENCE IN 
SAMPLES FROM A NORMAL UNIVERSE 

By J. D. Wiluams 
Phoenix^ Arizona 

The following result may have considerable application to trend analysis. 
The specific problem was proposed to me by R. H. Kent. 

Consider a sample On : , X2 , • • • , Xn from a normal population with zero 

mean and variance o’*, the variates being arranged in temporal order. We seek 
the moments of the ratio of 6* to jS*, where 

(1) (n - 1)«* - Z (^/ - 

y-1 

and 

(2) nS* » Z (X, - X)\ 

Here X is the mean of the X, . In order to simplify the algebra, we will work 
with quantities A and B defined by 

A = (n — 1)J*, 

(3) 

2«r*5 = n^. 

The characteristic function for the joint distribution of A and B is 
viti, it) - 


( 4 ) 
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where h aad tt are pure imaginaries. For the method of analysis which will be 
used here h and h will be considered as real variables. By straight forward 
methods we have 

a b d 

b e b d 

d b c b d 

I 

(6) = I 

• d b c b d 

. d b c b 

d d b a 

where the determinant is of nth order and its elements are 


(6) 


o = 1 — <1 — (n — 1)T 

■ , ) b = ti T 

C = 1 - 2<i - (n - 1)T 
d = r = tt/n. 

It can be verified that the determinant has the value 


(7) 


i-o\ 3 / 


<*) 


n-/-l 


where the symbol represents a binomial coefficient. From (7) we 

find the moments my of A/B as follows: Setting 


( 8 ) 

we have 

(9) 


— 




tOi 


IH 


<*) 


mi 


ft dtik 

< l «*0 


(n - l)(n + 1) . . . (n + 2j - 3) ■ 


The result is rather unexpected, for we have established that the moments of 
A/fi are equal to the moments of A divided by the moments of B. 
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We find the following explicit values for the first few moments w, : 
mo S' 1 
mi = 2 

(10) (n — l)(n + l)m* = 4(n* + n — 3) 

(w — l)(w "I" 1 )(m + 3)mt = 8(n* ■}- fin* -j- 2n — 21) 

(n - l)(n + l)(n + 3)(n + 5)m4 « lfi(n* + 14n* + 53n* - 8n - 231). 

These are valid subject to the restriction 2n — 1 > j, because in arriving at the 

explicit forms we have treated the binomial coefficient as if it were iden- 
tically equal to fc(fc — 1) • • • (fc — j + l)/j\. 

From (10) it is easy to pass to the moments of iZ «« j*//S*. For example, we 
find the mean value and variance of £ to be 

2n 

n — 1 
and 

4n*(n — 2) 

(n + l)(n - ly 

respectively. 




ON THE INTEGRAL E<FDX110N OR RENEWAL THEORY 

By Willy Feller 
Brown University 

1. Introduction. In this paper we consider the behavior of the solutions of 
the integral equation 

(1.1) u(t) = g(t) + u(i- x)f(x)dx, 

where f{t) and g{t) are given non-negative functions/ This equation appears, 
under different forms, in population theory, the theory of industrial replacement 
and in the general theory of self-renewing aggregates, and a great number of 
papers have been written on the subject/ Unfortunately most of this literature 
is of a heuristic nature so that the precise conditions for the validity of different 
methods or statements arc seldom known. This literature is, moreover, abun- 
dant in controversies and different conjectures which are sometimes supported 
or disproved by unnecessarily complicated examples. All this renders an ori- 
entation exceedingly difficult, and it may therefore be of interest to give a 
rigorous presentation of the theory. It will be seen that some of the previously 
announced results need modifications to become correct. 

The existence of a solution u{t) of (1.1) could be deduced directly from a well- 
known result of Paley and Wiener [21] on general integral equations of form 

(1.1) .* However, the case of non-negative functions f{t) and g(t)y with which 
we are here chneemed, is much too simple to justify the deep methods used by 
Paley and Wiener in the general case. Under the pre^nt conditions, the exist- 
ence of a solution can be proved in a simple way using properties of completely 
monot 9 ne functions, and this method has also the distinct advantage of showing 
some properties of the solutions, which otherwise would have to be proved 
separately. It will be seen in section 3 that the existence proof becomes most 
natural if equation (1.1) is slightly generalized. Introducing the summatory 
functions 

(1.2) U{t) = f u(x)dx, F(t) = f f{x)dx, G(t) = f g(x)dx, 

Jo Jo Jo 

^ For the interpretation of the equation cf . section 2. 

* Lotka’s paper [8] contains a bibliography of 74 papers on our subject published before 
1939. Yet it is stated that even this list ‘‘is not the result pf an exhaustive search.’’ At 
the end of the present paper the reader will find a list of 16 papers on (1.1) which have 
appeared during the two years since the publication of Lotka’s paper, 
has been remarked also by Hadwiger [31. 

243 
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equation (1.1) can be rewritten in the form 

(1,3) C7({) * oit) + mt - x) dF{x). 


However, (1.3) has a meaning even if F{t) and 0(t) are not integrals, provided 
F{t) is of bounded total variation and the integral is interpreted as a Stieltjes 
integral. Now for many practical applications (and even for numerical calcula- 
tions) this generalized form of the integral equation seems to be the most 
appropriate one and, as a matter of fact, it has sometimes been used in a more or 
less hidden form (e.g., if all individuals of the parent population are of the same 
age). Our existence theorem refers to this generalized equation. 

We then turn to one of the main problems of the theory, namely the asymptotic 
behavior of u{t) as < — ► oo. It is generally supposed that the solution u{() 
‘4n general” either behaves like an exponential function, or that it approaches 


in an oscillating manner a finite limit q ; the latter case should arise if / f{t)dt ^ 1 , 

Jo 

thus in particular in the cases of a stable population and of industrial replace- 
ment. However, special examples have been constructed to show that this is 
not always so.^ In order to simplify the problem and to get more general condi- 
tions, we shall first (section 4) consider only the question of convergence in mean, 
that is to say, we shall study the asymptotic behavior not of u{t) itself but of 

1 

the mean value u*(t) = j / u(x) dx. The question can be solved completely 

t Jo 

using only the simplest Tauberian theorems for Laplace integrals. Of course, 
if u{t) q then also u*{i) q, but not conversely. The investigation of the 
precise asymptotic behavior of u{t) is more delicate and requires more refined 
tools (section 5). 

Most of section 6 is devoted to a study of Lotka^s well-known method of 
expanding u{t) into a series of oscillatory components, and it is hoped that this 
study will help clarify the true nature of this expansion. It will be seen that 
Lotka's method can be justified (with some necessary modifications) even in 
some cases for which it was not intended, e.g., if the characteristic equation has 
multiple or negative real roots, or if it has only a finite number of roots. On 
the other hand limitations of the method will also become apparent: thus it 
can occur in special cases that a formal application of the method will lead to a 
function u{i) which apparently solves the given equation, whereas in reality it 
is the solution of quite a different equation. 

Of course, most of the difficulties mentioned above arise only when the func- 
tion f{t) has an infinite tail. However, it is known that even computational 
considerations sometimes require the use of such curves, and, as matter of fact, 


♦ Cf. Hadwiger [2] and also Hadwiger, “Zur Bereohnung der Erneuerungsfunktion nach 
eiher Formel von V. A. Kostitzin/' Mitt. Verein, Bchxoeizeritcher Versich.^Math.y Vol. 34 
(1937), pp. 37-43. 
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exponential and Peaisonian curves have been used most frequently in connec* 
tion with (1.1). It will be seen that even in these special cases customaiy 
methods may lead to incorrect results. Besides^ our considerations show how 
much the solution u{i) is influenced by the values of f{t) lot tr^ oo, and, accord- 
ingly, that extreme caution is needed in practice. The last section contains 
some simple remarks on the practical computation of the solution. 

2. Greneralities on equations (1.1) and (1.8). This section contains a few 
remarks on the meaning of our integral equation and on an alternative form 
under which it is encountered in the literature. A reader interested only in the 
abstract theory may pass immediately to section 3. 

Equation (1.1) can be interpreted in various ways; the most important among 
them are the following two: 

(i) In the theory of industrial replacement (as outlined in particular by Lotka), 
it is assumed that each individual dropping out is immediately replaced by a 
new member of zero age. /(<) denotes the density of the probability at the 
moment of installment that an individual will drop out at age t The function 
g{t) is defined by 

(2.1) gii) ^ f riix)f{t-x)dx, 

where ij(x) represents the age distribution of the population at the moment 
^ = 0 (so that the number of individuals of an age between x and a; + is 
+ o{8x)). Obviously g(t) then represents the rate of dropping out at 
time t of individuals belonging to the parent population. Finally, u{t) denotes 
the rate of dropping out at time t of individuals of the total population. Now 
each individual dropping out at time t belongs either to the parent population, 
or it came to the population by the process of replacement at some moment 
t — X (0 < X < t)y and hence u(t) satisfies (1.1). It is worthwhile to note that 
in this case 

( 2 . 2 ) 

since f(t) represents a density of probability. 

(ii) In population theory u(t) measures the rate of female births at time t > 0. 
The function /(<) now represents the reproduction rate of females at age t (that 
is to say, the average number of female descendants bom during (f, t + if) 
from a female of age t is + o(i0). If again stands for the age distri- 
bution of the parent population at f = 0, the function gif) of (2.1) wiU obviously 
measure the rate of production of females at time t by members of the parent 
population. Thus we are again led to (1.1), with the difference, however, that 
this time either of the inequalities 
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may occur; the value of this integral shows the tendency of increase or decrease 
in the total population. 

Theoretically speaking, f{t) and g(t) are two arbitrary non-negative functions. 
It is true that g(t) is connected with/(0 by (2.1); but, since the age distribution 
il(x) is arbitrary, g{t) can also be considered as an arbitrarily prescribed fun^ion. 

It is hardly necessary to interpret the more general equation (1.3) in detail: 
it is the straightforward generalization of (1.1) to the case where the increase or 
decrease of the population is not necessarily a continuous process. This form 
of the equation is frequently better adapted to practical needs. Indeed, the 
functions f{t) and g{t) are usually determined from observations, so that only 
their mean values over some time units (j^ears) are known. In such cases it is 
sometimes simpler to treat f{t) and g{l) as discontinuous functions, using 
equation (1.3) instead of (1.1). For some advantages of such a procedure 
see section 7. It may also be mentioned that the most frequently (if not the 
only) special case of (1.1) studied is that where g(t) = f{t). Now it is apparent 
from (2.1) that this means that all members of the parent population are of 
zero age: in this case, however, there is no continuous age-distribution ri{x). 
Instead we have to use a discontinuous function r}(x) and write (2.1) in the form 
of a Stieltjes integral. Thus discontinuous functions and IBtieltjes integrals 
present themselves automatically, though in a somewhat disguised form, even 
in the simplest cases. 

At this point a remark may be inserted which will iJlPove useful for a better 
understanding later on (section 6). In the current literature we are frequently 
confront^ not with (1.1) but vdth 


(2.4) 



u{t ~ x)f{x) dXy 


together with the explanation that it is asked to find a solution of (2.4) which 
reduces, for / < 0, to a prescribed function h{i). Now such a function, as is 
known, exists only under very exceptional conditions, and (2.4) is by no means 
equivalent to (1 .1). The current argument can be boiled down to the following. 
Suppose first that the function g{t) of (1.1) is given in the special foim 

(2.5) git) - hit- x)fix) dx, 


where hix) is a non-negative function defined for a; < 0. Since the solution 
uit) of (1.1) has a meaning only for < > 0, we are free to define that ui—t) = 
hi—l) for t > 0. This arbitrary definition, then, formally reduces (1.1) to (2.4). 
It should be noted, however, that this function uit) does not, in general, satisfy 
(2.4) for t < 0, for hit) was prescribed arbitrarily. Thus we are not, after all, 
concerned with (2.4) but with (1.1), which form of the equation is, by the way, 
the more general one for our purposes. If tfcfere really existed a solution of 
(2.4) which reduced to hit) for « < 0, we could of course define git) by (2.5) and 
transform (2.4) into (1.1) by splitting the interval (0, <») into the subintervals 
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(0, <) and {t, 00 ), However, as was already mentioned, a solution (rf the required 
kind does not exist in general. It will also be seen (section 6) that the true 
nature of the different methods and the limits of their applicability can be under- 
stood only when the considerations are based on the proper equation (1.1) and 
not pn (2.4). 

3. Existence of solutions. 

Theorem 1. Let F(t) and G(t) be two finite non-decreasing functions which 
are continuous to the righ^. Suppose that 

(3.1) F(fi) = (?(0) = 0, 
and that the Laplace integrals^ 

(3.2) v(a) = f c"*‘dF(t), 7 (s) = f e~“dG(t) 

Jo Jo 

converge at least for s > <t > 0^ In case that lim <p(8) > 1, let a' > (r be the root^ 

«— ►o'+O 

of the characteristic equation <p{s) = 1; in case lim (p{s) < 1, put a' = a, 

ff mSff.^0 

Under these conditions there exists for t> 0 one and only one finite nonrdecreasing 
function U{t) satisfying (1.3). With this function the Laplace integral 

(3.3) w{s) = j\-’‘dU{t) 


® It is needless to einpluiaize that this restriction is imposed only to avoid trivial am- 
biguities. 

® The integrals (3.2) should be interpreted as Lebesgue-Stieltjes integrals over open 
intervals; thus 

ipie) — lim f e~'**dF(t)f 
•-♦•HI J, 

which implies that ^(s) 0 as « -♦ « . Alternatively it can be supposed that F(l) and 

G(0 have no discontinuities at ^ 0. Continuity of Fit) at < * 0 means that there is no 

reproduction at zero age. This assumption is most natural for our problem, but is by no 
means necessary. In order to investigate the case where F(t) has a saltus c > 0 at ^ 0, 

one should take the integrals (3.2) over the closed set [0, « ], so that 

v>(fi) — c + lim f e'^*^dF(t), 

.-+0 J , 

It is readily seen that Theorem 1 and its proof remain valid if 0 < c < 1. However, if 
c > 1, then (1.3) plainly has no solution U(t). The continuity of G(i) at < 0 is of no 

importance and is not used in the sequel. 

^ The condition is formulated in this general way in view of later applications (cf., e.g., 
the lemma of section 4). In all cases of practical interest o- » 0. 

* ^(«) is, of course, monotonic for <r and tends to zero as « . In order to ensure 

the existence of a root of ^(«) « l , it is sufficient to suppose that the saltus o of F(t) at t 0 
is less than 1 (cf. footnote 6). 
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cotwergee for < > o\ and 

Proof: A trivial computation shows that for any finite non-decreasing solu- 
tion r/(0 of (1.3) and any P > 0 we have 

f\-**dU(t) = f e~“dG(t)+ r e-^UF(x) e~’*dU(t); 

Jo Jo Jd Jo 

herein all terms are non-negative and hence by (3.2) 

e-“dU{i) < y(s) -|- .pis) j^\-‘*dUit), 

Now ^(s) < 1 for fi > a', and hence it is seen that the integral (3.3) exists for 
« > a' and satisfies (3.4). On the other hand it is well-known that the values of 
co(«) for 8 > a' determine tlie corresponding function U{t) uniquely, except for 
an additive constant, at all points of continuity. However, from (1.3) and (3.1) 
it follows that f/(0) = 0 and, since by (1.3) U{t) is continuous to the right, the 
monotone solution U{i) of (1.3), if it exists, is determined uniquely. 

To prove the existence of U(t) consider a function w(8) defined for s > <r' by 

(3.4) . It is clear from (3.2) that <p{8) and 7 (s) are completely monotone func- 
tions, that is to say that ^(s) and 7 ( 5 ) have, for s > cr, derivatives of all orders 
and that (— 1) V^'*^(s) > 0 and (— 1 )*‘ 7 ^”^(«) > 0. We can therefore differentiate 

(3.4) any number of times, and it is seen that w^’*^(s) is continuous for s > cr'. 
Now a simple inductive argument shows that ( — l)”w^’‘^(s) is a product of 
{1 — ^(«)}~^"^^^ by a finite number of completely monotone functions. It 
follows that ( — l)’‘w^"\5) > 0, so that o>{s) is a completely monotone function, 
at least for s > <r'. Hence it follows from a well-known theorem of S. Bernstein 
and D. V. Widder* that there exists a non-decreasing function U{t) such that 
(3.3) holds for 8 > tr'. Moreover, this function can obviously be so defined that 
f7(0) *= 0 and that it is continuous to the right. Using U{t) let us form a new 
function 

(3.5) Vit) = £ Uit - x) dFix). 

y(0 is clearly non-negative and non-decreasing. It js readily verified (and, of 
course, well-known) that 

^(s) = f e-'*dVit) - a,(»)v,(8). 

Jo 

It follows, therefore, from (3.4) that ^(«) « uis) — y(<), and this implies, by the 


* This theorem hee been repeatedly proved by several authors; for a recent proof cf. 
FeUer [19]. 
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uniqueneBS theorem for Laplace transfonoB, that 7(0 » (7(0 — Oit). Combin- 
ing this result with (3.6) it is aeea ibat (7(0 is a sedation of (U); 

T^borem 2. Suppose Ouit f(t) and g(t) are measurable, nen-nefpUwe and 
hounded in every finite interval Q t < T. Let the integrals 

(3.6) v>(«) - e-*7(0 dt, y(s) - jf" c-‘ff(0 dt 

converge for a > <r. Then there exists one and only one non-negative solution u(0 
of (LI) which is bounded in every finite iniervdC^. With this function the integral 

(3.7) «(«) = j^*c"**u(0d< 

converges at least for s > a', where a' = o if lim <p{s) < 1, and oOierwise e' > e 

§-*tr+0 

is defined as the root of the characteristic equation (p{s) == 1. For s > a* equation 
(3.4) holds. 

If f(t) is continuous except, perhaps, at a finite number of points then u{t) — g{t) 
is continuous. 

Proof: Define F{i) and G{t) by (1.2). Under the present conditions these 
functions satisfy the conditions of Theorem 1, and hence (1.3) has a non-decreas- 
ing solution U{t), Consider, then, an arbitrary interval 0 < t < T and suppose 
that in this interval f{t) < M and g{t) < M, If 0 < t < t + h < T we have 
by (1.3) 

0< J{C7((4-7i) - (7(0} 

m i {G{t + ft) ~ G(0} + r U(t + h- x)f(x)dx 

h n Jt 

+ lj^{U{t + h-x)~Uit- x)\f(x)dx 
<M + MU(T) + ^ ^ {Uit + h-x)-m-x)]dx 

= M -h MU{T) + y my) dy-j^ Uiy) dy 
<M-\- 2M(7(r). 

Thus U(t) has bounded difference ratios and is therefore an integral. The 
derivative U\t) exists for almost all t and 0 < U'{t) < M, Accordingly we can 
differentiate (1.3) formally, and since J7(0) = 0 it follows that u(t) * U'(t) 
satisfies (1.1) for almost all t. However, changing u{t) on a set of measure zero 
does not affect the integral in (1.1), and since g(t) is defined for alH it is seen that 

Without the assumptions of positiveness and boundedness this theorem reduces to a 
special case of a theorem by Paley and Wiener [21]; cf. section 1, p. 243. 
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u(() can be defined, in a unique way, so as to satisfy (1.1) and obtain (1.3). 
Since the solution of (1.3) was uniquely determined it follows that the solution 
u(t) is also unique. Obviously equations (3.7) and (3.3) define the same function 
w(s), so that (3.4) holds, and (3.7) converges for « > <r'. 

Finally, if /(<) has only a finite number of jumps, the continuity of m(0 — g(t) 
becomes evident upon writing (1.1) in the form 

m( 0 — (l(0 = f «(*)/(< ~ x) dx. 

Jo 


4. Asymptotic properties. In this section we shall be concerned with the 
asymptotic behavior as t -♦ <» not of u{t) itself but of the mean value u*(<) = 
1 f* 

- / w(t) dr. If u(t) tends to the (not necessarily jSnite) limit C, then obviously 
t Jo 

also — ► C, whereas the converse is not necessarily true. For the proof of 

the theorem we shall need the following obvious but useful 
Lemma: Ifu(t) >0i8a solution of (1.1) and if 

(4.1) «»(f) = c*‘M(t), /i(0 = e*‘/(0, g^{t) = e*V«), 

then Ui(t) is a solution of 

ui{t) = gi{i) + f ui(t - x)fi(x) dx. 

Jo 

Theorem 3 : Suppose that using the functions defined in Theorem 2 the integrals 


(4.2) 

are finite. 

(i) In order that 

(4.3) 


f fit) dt^ a, f g{t) dt = b, 
Jo Jo 

u*{t) = - f u{t) dr 
t Jo 


as t 00 , where C is a positive constant y it is necessary and sufficient that a = 1, 
and that the momentj 


(4.4) 

be finite. In this case 

(4.5) 

(ii) 7/ o < 1 we have 

(4.6) 


r t fit) d:^ 

Jo 


c = 


tn 


f.(.) 


dt 


1 - a 
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(iii) If a > 1 let a' be the poeitive root of ^ eharttcterietie equation ipHe) <■ 1 
(cf. (3.2)) and jnd^ 

( 4 . 7 ) e-’Umdt^^ VH, 

Then 

(4.8) 11® 7 f e~*'^u(T)dT » — . 

*-►•0 t Jq fill 

Remark: The case a = 1 corresponds in demography to a population of 
stationary size. In the theory of industrial replacement only the case a « 1 
occurs; the moment m is the average lifetime of an individual. The case a > 1 
corresponds in demography to a population in which the fertility is greater than 
the mortality. As is seen from (4.8), in this case the mean value of u{i) increases 
exponentially. It is of special interest to note that in a population with a < 1 
the integral (4.6) always converges. 

Proof: By (4.2) and (3.7) 

(4.9) lim (t>{8) = a, lim ^(s) * b. 

If a < 1, it follows from (3.4) that lim aj(s) = 6/(1 — o) is finite. Since w(<) > 0 
this obviously implies that (4.6) holds. This proves (ii). 

If a = 1 and m is finite, it is readily seen that 


lim 


1 — <p(s) 
8 


m, 


and hence by (3.4) 

lim 8u){8) = lim 7 ( 5 ) lim - — — 7 -\ = -• 

*-♦+0 •-♦-fo 1 — (p\8) m 

By a well-known Tauberian theorem for Laplace integrals of non-negative 

functions^^ it follows that u*(t) - . Conversely, if (4.3) holds it is readily 

m 

seen that'® 


(4.2) implies the finiteness of mi . 

Cf. e.g. Doetsch [18], p. 208 or 210. 

Indeed, if (4.3) holds and if U (i) is defined by (1.2), then there is a Af — M(t) such that 
i U{i) -Ct\<M+ d. Now 

av(») - C -a* f - CO dt, 

Jo 


avia) - Cl ^ a* + «<)<«- «M + .. 


and hence 
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lim ««(«) ■» C, 

•-*+Q 


which in turn implies by (3.4) and (4.9) that 


lim 
• -•+0 


1 — »(«) 


b 

C’ 


This obviously means that the moment (4.4) exists and equals h/C. This 
proves (i). 

Finally case (iii) reduces immediately to (ii) using the above lemma with 
A: ■= —c'. This finishes the proof. 

It may be remarked that the finiteness of the integrals (4.2) is by no means 
necessary for (4.3). This is shown by the following 
Example: Let 


/«) 






y/rt 


1 


It is readily seen that with these fimctions a 
e~V^ and y{s) — e“V«/\/a, so that 


1, but b => CO. Now** 49(«) 


«(s) 


e“V» 


Vs (1 — ) 


Thus «w(«) — ♦ 1 as 8 — > + 0, and hence u*(t) —>■ 1. In this particular case it can 
even be shown that the solution u(i) itself tends to 1 as t <» . 

In practice, however, the integrals (4.2) will always exist, and accordingly we 
restrict the consideration to this case. 


' 5. Closer study of asymptotic properties. In this section we shall deal almost 
exclusively with the most important special case, namely where 

(5.1) jTfCDdt 1. 

The question has been much discussed whether in this case necessarily u(t) — » C 
as ( — » w, which statement, if true, would be a refinement of (4.3). Hadwiger 
[2] has constructed a rather complicated example to show that u(t) does not 
necessarily approach a limit. Now this can also be seen directly and without 
any computations. Indeed, if u(l) — » C and if (5.1) holds, then obviously 

lim f u(t — x)f(x) dx >= C, 

and hence it follows from (1.1) that g(t) —*0. In order that «(t) — > C it is therefore 


‘fThe integrals can be evaluated by elementary methods, and are known; of. e.g. 
Doetsch [18], p. 25. 



RBiniWAL iSiMUtT 


2S8 


neoessaty that ff(t) -*■ 0, and ihis proves the assertion. In Hadvniger^s exanjide 
limsup9(0 00 , which makes his computatio&s unnecessary. 

It can be shown in a simQar manner that not even the ccmdition f(t) -*■ 0 is 
sufficient to oisure that u(t) —* C. Some restriction as to the total variation of 
f(i) seems both necessary and natural (conditions on the existence of derivatives 
are not sufficient). In the following theorem we shall prove the convergmoe of 
u(0 under a condition which is, though not strictly necessary, sufficiently wide 
to cover all cases of any possible practical interest. 

Theorkm 4: Suppose that unth the functions f(t) and g{t) of Theorem 2 

(5.2) j f(t ) (ft = 1, g(,t) dt =‘ b < to. 

Suppose moreover that there exists an integer n ^ 2 such that the moments 


(5.3) 


m* 


f <‘/«)<ft, 

Jo 


A = 1, 2, •••,!», 


are finite, and that the functions f(t), tf{t), ^f{t), ... ,C *f(t) are of hounded toted 
variation over {Q, «). Suppose finally that 


(5.4) 
Then 

(5.5) 
and 

(5.6) 


lim t' 
<-►00 


in— 2 


git) = 0 and 


lim t 

(-►•O 


n— 2 


g(x) dx 


0 . 


lim u(t) = — 

<-»ao Vli 


lim f — —1 = 0. 

<-«» I mij 


Remakk: As it was shown in section 4, the case where / f(t)dt > 1 

Jo 

can readily be reduced to the above theorem by applying the lemma of section 4 
with k = <r', where <r' is the positive root of (p($) = 1 : it is only necessary to 
suppose that 7(0 is of bounded total variation and that 0. Ob- 
viously all moments of exist, so that the above theorem shows that 

Ui{t) = e^'^u{t) tends to the finite limit b'/m[ , where 


6 ' = J =* J e^'Ufit)dt. 

Thus in this case and under the above assumptions u{t) ^ ^ so that the 

W.\ 

renewal function increases exponentially as could be expected. If however 

mdt < 1, 


i 
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u(0 will ini general not show an exponential character. If /(O is of bounded 
variation and has a finite moment of second order, and if g{f) —*■ 0, then it can be 
shown that u{t) —* 0. However, the lemma of section 4 can be applied only if 
the integral defining ^(«) converges in some negative «-interval containing a v^e 
s' such that ^(s') >= 1 , and this is in general not the case. 

Pboov: The proof of Theorem 4 will be based on a Tauberian theorem due to 
Haar*‘. With some specializations and obvious changes this theorem can be 
formulated as follows. 

Suppose that l{t) is, for t > 0, non-negative and continuous, and that the 
Laplace integral 

(5.7) X(8) = ^ e~*‘l(0 dt 

converges for s > 0. Consider X(8) as a function of the complex variable s = 
x + iy and suppose that the following conditions are fulfilled: 

(i) For y 9 ^ 0 the function X(8) (which is always regular for a: > 0) has con- 
tinuous boundary values \(iy) as x -1-0, for x > 0 and y 9 ^ 0 


(6.8) Us) = ^ + 4'i8), 

8 

where fiiy) has finite derivatives ^'(ij/)» ••• aud ^‘'^(iy) is bounded 

in every finite interval; 


(ii) 



e‘“'X(x -I- iy) dy 


converges for some fixed x > 0 uniformly with respect to < > T > 0; 

(iii) X(x -f- iy) — » 0 as y — ♦ ± «> , uniformly with respect to x > 0; 

(iv) X'(iy), X"(iy), • • • , X^''^(iy) tend to zero as y ± » ; 

(v) The integrals 

f e*‘''U'\iy)dy and f e'‘^U'\ty)dy 

•Loo Jvi 

(where yi < 0 and ys > 0 are fixed) converge xmiformly with respect to < > T > 0. 
Under these conditions 


(6.9) lim eim - C} = 0. 

Now the hypotheses of this theorem are too restrictive to be applied to the 
solution u(t) of (1.1). We shall therefore replace (1.1) by the more special 
equation 

(6.10) v(i) = hit) + j vit — x)/(x) dx, 


>• Haar [20] or Doetacb [18], p. 209. 
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h{t) a. /(« — x)f{x) dx. 

Plainly Theorem 2 can be applied to (5.10). It is also plain that k(t) is bounded 
and non-negative and that (by (5.1)) 

(5.12) h(<)d«*l, 

(5.13) x(«) - e~*‘h(t)dt - ^.’(s). 


Accordingly we have by Theorem 2 

(5.14) tM = e~**v(t)dt -= 

We shall first verify that f (s) satisfies the conditions of Haar’s theorem with 
r = n — 2. For this purpose we write 

(5.16) /(O = /i(0 - /*(0, 

where fi{t) and /j(i) are non-decreasing and non-negative functions which are, 
by assumption, bounded: 

(5.16) 0 < flit) <M, 0 < Mt) < M. 


(a) We show that «>(0 is continuous. Now by Theorem 2 the solution vit) 
of (5.10) is certainly continuous if hit) is continuous; however, that hit) is con- 
tinuous follows directly from (5.11) and the fact that the functions 

j flit — x)fix) dx and ftit — x)/(x) dx 

are continuous. 

(b) In view of (5.1) the function ipis) exists for x dtis) > 0. Obviously 
I tpit -h iy) I < 1 forx > 0. Now 


= Cil-e-^nmdt 

Jo 



cos yt)fit) dt + ij sin yt-fit) dt, 


and, since 1 — cos > 0 and/(0 > 0, the equality ^iiy) = 1 for y 0 would 
imply that fit) = 0 except on a set of measure zero. It is therefore seen that 
v>(x -|- iy) 9 ^ 1 for all x > 0 and for x « 0, y 0. 

It follows furthermore from (5.3) that for 1: « i, . . . , n and x > 0 the deriva- 
tives 
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exist and tiist 


lim + iy) « «9‘*’(ty). 


Finally, it u readily seen that in the neif^borhood of y ■■ 0 ^ have 

viw) - 


(6.17) " 1 — miiy + — (iy)* — + • • • 


+ (»•!/)"“* + 0(1 y 1"). 


(c) From what was said under (b) it follows by (6.14) that f(«) is regular for 
a: > 0, and that {■(«), f'(s), • • • , ^‘"’(s) approach continuous boundary values 
as « = I + ly approaches a point of the imaginary axis other than the origin. 
Now put 


(5.18) 


^(s) = 


As) 

1 - <p(8) 


mis’ 


so that by (5.14) 

(6.19) i'(s) = — + ^(s). 

mi 8 


For a; > 0 and x — 0, y 9 ^ 0 the function + iy) is obviously continuous; 
the derivatives • • • , exist. To investigate the behavior of 

tl^(iy) in the neighborhood of y = 0 put 


(5.20) P(y) = mi - ^(ty) + ( - 1 )"-> . (iy)"-*. 

By (5.17), (5.18) and (5.20) 

(6.21) [iL:wr-l]^ + 0(1, r-^. 

Now the expression in brackets represents an analytic function of y which 
vanishes at y = 0. Hence ^(fy) = ^(y) + 0(| y |"~’), where $(y) denotes a 
power series. It follows that the derivatives • • • , exist for all 

real y (including y = 0) and are bounded for sufiBciently small | y | : since they 
are continuous functions they are bounded in every finite interval. 

(d). Next we show that there exists a constant A > 0 such that for sufficiently 
large | y | 

(6.22) |^(a: + ty)|<i. 

\y\ 

uniformly in * > 0. By (6.16) 

(6.23) v>(«) = {cosyf — *siny<}e"*‘{/i(f) — /*(f)}d<- 
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Now/i(0 is non-decrea^ wd ^cotdingly the second mean<value theomn 
we have for any T > 0 and y 

r cos yt.fiit) dt - MT} r cos ytdt - /,(?) ®5i5Ll®L!F, 

Jo , ‘ Jr y. . 

where r is some value between 0 and T (depending, of course, on y; at points of 

discontinuity, /i(r) should be replaced by lim Hence by (6.16) 

<-*r-o 

coa yt‘e~**‘ flit). dt <~y 

Treating the other terms in (5.23) in a like manner, (5.22) follows. 

Combining (5.22) with (5.14) it is seen that for sufficiently large | y | 

uniformly in a: > 0. This shows that the assumptions (ii) and (iii) of Haar’s 
theorem are satisfied for X(s) = (•(»). In order to prove that also conditions 
(iv) and (v) are satisfied it suffices to notice that the proof of (5.22) used only 
the fact that/(0 is of bounded total variation. Now y>^*’(«) is the Laplace trans- 
form of (— 0*/(0i “id, since <*/(0 is of bounded total variation for A < n — 2, 
it follows that 

k'%)l = 0(|yr), = 1,2, ...,n -2, 

for sufficiently large | y |, uniformly in x > 0. Differentiating (5*14) k times it is 
also seen that 

lf'*’(«)l » 0(|yr*), A » 1, 2, ... ,n - 2, 

as ^ ^ + 00 , uniformly with respect to x > 0. 

This enumeration shows that v{8) = l(t) and X(«) = f (s) satisfy all hypotheses 
of Haar’s theorem with r = n — 2 and C = l/nii . Hence 

(6.24) lim -—1*0. 

l-.ce ( mi) 

Returning now to (6.14) we get 

w(8) = 7(«) -I- 7 («)<p( 8) + y(«)f(«), 
or, by the uniqueness property of Laplace integrals, 

w(0 “ git) + f gix)fit - x)dx + [ gix)vit - x)dx 

(5.25) •'« 

= git) + tti(0 + «t(<) 

(which relation can also be checked directly using (5.10)). Let us begin with 
the last term. We have by (6.2) 
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and hence 


mi Jo ^ mij 


- £- ^2-^ f - »)a ‘'C*) - i- 

mi Ji/t mi 


dx 


+ <" 


Jtft 


mi 

v{t — 


mi 


dy. 


If t is sufficiently large we have by (5.24) in the first integral x" v(pc) — < u 

mi 

In the second integral v{t — y) — ^ is bounded, and hence by (5.4) 

mi 


lim r u*(0 — — 0. 

t-^oo mi 

The same argument applies (even with some simplifications) also to the second 
term in (5.24) ; it follows that 

lim f~*ui{t) = 0, 

whilst <"~V(0 0 by assumption (5.4). Now the assertion (5.6) of our theorem 

follows in view of (5.25) if the last three relationships are added. This finishes 
the proof of Theorem 4. 

It seems that the solution u(() is generally supposed to oscillate around its 
limit b/mt as t —* <». It goes without saying that such a behavior is a priori 
more likely than a monotone character. It should, however, be noticed that 
there is no reason whatsoever to suppose that u(() alteays oscillates around its 
limit. Again no computation is necessary to see this, as shown by the following 
Example: Differentiating (1.1) formally we get 

tt'(0 = fir'(t) + g(.0)f(t) + f u'(t - x)/(x) dx, 

Jo 

which shows that, if g(t) and f(i) are sufficiently regular, u'(t) satisfies an integral 
equation of the same type as u(t). Thus if 

17'(<) + i7(0)/(0 > 0 

for all i, we shall have u'(i) > 0, and u(t) is a monotone function. In particular, 
if g'(() + gi0)f(t) = 0, then u'{t) = 0 and u(t) = const. For example, let/(0 = 
ff(() — e~‘. Then ^(«) * y(8) = l/(« + 1) and hence w(«) = l/«, which is the 
Laplace transform of u(t) = 1. It is also seen directly that u(t) k 1 is the 
solution. We have however the following 
Theorem 5“: If the functions /(<) and g(t) of Theorem 4 vanish identically for 
t>T>Q, then the solution u{t) of (1.1) oscillates around its limit b/mast—* oo. 

“ Under some slight additional hypotheses and with quite different methods this theorem 
was proved by Richter [16]. 
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Proof: For< > T equation (1.1) reduces to 

w(0 >= f «(< — x)/(i) dx, 

Jt-T 

and since f f(x) dx » 1 it follows that the maxima of u(t) in the intervals nT < 

t < (n + 1)T form, for sufficiently large integers n, a non-increasing sequence. 
Similarly the corresponding minima do not decrease. Since u(t) — » b/mi , by 
Theorem 4, it follows that the minima do not exceed b/trii and the maxima, are 
not smaller than b/mi . 

6. On Lotka’s method. Probably the most widely used method for treating 
equation (1.1) in connection with problems of the renewal theory is Lotka’s 
method. As a matter of fact this method consists of two independent parts. 
The first step aims at obtaining the exact solution of (1.1) in the form of a series 
of exponential terms (this is achieved by an adaptation of a method which was 
used by P. Herz and Herglotz for other purposes. The second part of Lotka’s 
theory consists of devices for a convenient approximative computation of the 
first few terms of the series. While restricting ourselves formally to Lotka’s 
theory, it will be seen that some of the following remarks apply equally to other 
methods. 

Lotka’s method rests essentially on the fundamental assumption that the 
characteristic equation 

(6.1) = 1 

has infinitely many distinct simple” roots So , , • • • , and that the solution u(<) 

of (1.1) can be expanded into a series 

(6.2) u(t) = E 

k 

where the Ak are complex constants. The argument usually rests on an assumed 
completeness-property of the roots. Thus, starting from (2.4) it is required that 

(6.2) reduces to h(t) for t < 0; in other words, that an arbitrarily prescribed 
function h(x) be, for x < 0, respresentable in the form 

(6.3) A(x) = £ (x < 0). 

k 

In practice we are, of course, usually not concerned with h(t) but with g(t) (cf. 
(2.5)), and according to Lotka’s theory the coefficients A* of the solution (6.2) 
of (1.1) can be computed directly from g(t) in a way similar to the computation 
of the Fourier coefficients. 

Lotka’s method is known to lead to correct results in many cases and also to 

Hadwiger [3] objected to the assumption that all roots of (6.1) be simple. The modifi- 
cations which are necessary to cover the case of multiple roots also will be indicated below. 
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have distinct computational merits. On the other hand it seems to require a. 
safer justihcation, since its fundamental assumptions are rarely realized. Thus 
clearly an arbitrary function h{x) cannot be represented in the form (6.3): to 
see this it suffices to note that (6.1) frequently has only a finite number of roots 
(cf. also below). It should also be noted that^ the series (6.3) having regularity 
properties as are assumed in Lotka's theory, any function representable in the 
form (6.3) is necessarily a solution of the integral equation (2.4), whereas the 
theory requires us to construct a solution u(t) which reduces to an ar^trarily 
prescribed function h(t) for < < 0, (which frequently is an empirical function, 
determined by observations). Nevertheless, it is possible to give sound founda- 
tions to Lotka’s method so that it can be used (with some essential limitations 
and modifications) sometimes even in cases for which it originally was not 
intended. For this purpose it turns out to be necessary that all considerations 
be based on the more general equation (1.1), instead of (2.4) (cf. also section 2). 

Before proceeding it is necessary to make clear what is really meant by a root of 
(6.1). The function <p{s) is defined by (3.2), and the integral will in general 
converge only for s-values situated in the half-plane 9t(s) > <r. Usuallj" only 
roots situated in this half-plane are considered^*. It is also argued that ^(s) 
is, for real s, a monotone function, so that (6.1) has at most one real root: ac- 
cordingly the terms of (6.2) are called ‘‘oscillatory components.*^ However, 
the function ^(s) can usually be defined by analytic continuation even outside 
the half-plane 9?(s) > at, and, if this is done, (6.1) will in general also have roots 
in the half-plane 5R(«) < or. It will be seen in the sequel that these roots play 
exactly the same role for the solution u{t) as the other ones, and that the ap- 
plicability of Lotka’s method depends on the behavior of ip{s) in the entire 
complex s-plane. It may be of interest to quote an example where (6.1) has 
infinitely many real and no other roots. 

Example^®: Let 

(e.4) m - < > 0; 


This was stated in particular by Hadwiger [3] and Hadwiger and Ruchti [6]; accord- 
ingly the results of the latter paper (obtained by methods quite different from Lotka’s) 
need some modifications. 

Cf. the example at the end of section 4. A function closely* related to (6.4) 
plays an important role in two recent papers by Hadwiger (41 and [5]. Hadwiger^s conclu- 
sion, if it could be justified, would fundamentally change the aspect of the whole theory. 
The conclusion reached by Hadwiger seems to be that for any biological population the 
reproduction function should be of the form u{i) Sun(0» where Un{t) represents the 
contribution of the nth generation and 

(.) u,(«) - ^-At+0»n-„*atlt 

Here a, A and C are constants. Clearly (*} is a generalisation of (6.4) . Now his conclusion 
is based on the arbitrary assumption that Un(0 should be of the form tu(0 na), 
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Ml 

It is easily ^ tbat((p(«) - e“vr. The integral (3.2) convesigea onl^ for JB(«) ^ 
0, but ^(s) is d^ned as a two-valued function in the entire s^plane. The Toots of 

(6.1) are obviously ^ —4 k*r*, so that all of them are real and wimple. If 
g(t) - fit), we get by (3.4) 

A 

»(*) “ z z-r^ “* 2- , t real, > 0. 

1 - € V* 1 

Now e-»v^ is the Laplace transform of hence it is readily 

seen that the solution u(0 can be written in the form 

of course, this expansion is not of form (6.2) and shows no oscillatory character. 

From now on we shall consistently denote by ^(s) the function defined by the 
integral (3.4) and by the usual process of analytic continuation; accordingly we 
shall take into consideration all roots of (6.1). The main limitation of Lotka’s 
theory can then be formulated in the following way: Lotka’s method depends 
only on the function git) and on the roots of (6.1). Now two different frmctions 
fit) can lead to characteristic equations having the same roots. Lotka’s method 
would be applicable to both only if the corresponding two integral equations 

(1.1) had the same solution «(<). This, however, is not necessiuily the case. 
Thus, if Lotka’s method is applied, and if all computations are correctly per- 
formed, and if the resulting series for u(t) converges uniformly, there is no 
possibility of telling which equation is really satisfied by the resulting uif): 
it can happen that one has unwittingly solved some unknown equation of t 3 T)e 

(1.1) which, by chance, leads to a characteristic equation having the same roots 
as the characteristic equation of the integral equation with which one was really 
concerned. Indeed this happens in the following example which is familiar in 
connection with our problem. It is illustrative also for other purposes: thus it 
shows not only limitations of Lotka’s method, but also that this method can be 
modified so as to become applicable in some cases where the characteristic equa- 
tion has only a finite number of roots. 


where ^(z, a) is iedependent of n. To my mind Hadwiger’s result shows only the im- 
prsctibility of this axiom. However, Hadwiger’s result is not correct even under his assump- 
tion. Indeed, he derives for ^(z, a) the functional equation 

(••) a + b) - [ ^(x-t, a)^iS, b) df, 

which is well-known from the theory of stochastic processes. Now Hadwiger merely 
verifies the known result that (♦) leads to a solution of (•♦). However, (•♦) has infinitely 
many other solutions (it is possible to write down expressions for their Laplace transforms, 
although it is difficult to express the solutions themselves explicitly). This, of course, 
renders Hadwiger’s result illusory. 
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Example; Pearaon type Ill-curvea.*^ Consider the integral equation (1.1) 
in the following two cases: 

(I) = 

and 

(II) fit) = git) = fnit) = 

It is readily seen (and well known) that the corresponding Laplace transforms are 


(I) 

^(s) 

1 

(8 + !)»'* 

and 



(II) 

Wi(s) 

1 

(8-t-l)*’ 


respectively. Thus in both cases the characteristic equation has the same roots 
namely 


r 


8l = 0, 



of which only the first one lies in the half-plane of convergence of the integral 
(3.4). Lotka’s method is not applicable since there are only three roots. How- 
ever, in the second case, an expansion of type (6.2) is possible. Indeed, we have 
by (3.4) 


Wll(8) = 


yn(8) 

1 — <pu(«) 


8» -h 38* -b 3« 

1 _ t _ 

1.6 2\/3 



6 2\/3 

8 + 1 + 1 Vs 


now 1 /(« + a) is the Laplace transform of e and hence we obtain the solution 
u(t) in the form 



General Pearson curves have been investigated recently in connection with (1.1) by 
Brow^n [1], Hadwiger and Ruchti [6] and Rhodes [15]. Hadwiger and Ruchti use a method 
of their own, but they are also led to the study of the characteristic equation (6.1) in a 
slightly disguised form: their result needs a modification since they arbitrarily drop the 
roots lying in the halfplane of divergence of the integral ^(s). 
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which is an expansion of type (6.2), In the fitst of the above examples we get 
for real positive « 


«i(») 


_ <9i(s) _ 'V 


1 - w(«) (# + !)*»«’ 

and it is readily seen that this is the Laplace transform of the solution 

1 


«i(0 


■‘E 


r(3n/2) 




The series is convergent for < > 0, but obviously this solution cannot be repre- 
sented in a form similar to (6.2). 

A similar remark applies to the general Pearson-type III curve 

/«) = 


where A, a, /3 are positive constants; the corresponding Laplace transform is 


^(8) = AT($ + 1) 


1 

(8 + a)<»+i* 


These preparatory remarks enable us to formulate rigorous conditions for the 
existence of an expansion of type (6.2). The following theorem shows the limits 
of Lotka^s method, but at the same time it also represents an extension of it. 
In the formulation of the theorem we have considered only the case of absolute 
convergence of (6.2). This was done to avoid complications lacking any practi- 
cal significance whatsoever. The conditions can, of course, be relaxed along 
customary lines. 

Theorem 6: In order that the solution u{t) of Theorem 2 he representable in 
form (6.2), where the series converges absolutely for t>Q and where the denote the 
roots of the characteristic equation^^ (6.1), it is necessary and sufficient that the La- 
place transform w(s) admit an expansion 


( 6 . 6 ) 


«(s) s 


7(s) 


= E 


Ak 


“" 8 - 8 * 

and that Uj A* | converges abaoluiely. The coefficients A* are determined by 


(6.7) 


A* 


«>(«*) ' 


In particular, it is necessary that o>(,s) be a one-valued function^ 

Proof: All roots 8* of (6.1) satisfy the inequality 91(8*) < a', where o' was 
defined in Theorem 2. It is therefore readily seen that in case 2 | A* | con- 
verges, the Laplace transform of (6.2) can be computed for sufficiently large 


The number of roots may be finite or infinite. It should also be noted that it is not 
required that Sk-^ If the Sk have a point of accumulation, w(«) will have an essential 
singularity. That this actually can happen can be shown by examples. 

** This was not so in our example I. 
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positive s-values .by termwise integration so that (6.6) certainly holds for su& 
oiently large positive «. Now with 2 | At | converging, (6.6) defines »(«) 
uniquely for all complex s (with singularities at the points «« and the points of 
accumulation of st , if any). Since the analytic continuation is unique, it follows 
that (6.6) holds for all s. The series 2 | | must, of course, converge if (6.2) 

is to converge absolutely for i » 0, and this proves the necessity of our condi- 


tion. Conversely, if to(s) = 


7(s) 


is given by (6.6), and if 2 | A* | con- 


1 — <p(8) 

verges, then <<>( 8 ) is the Laplace transform of a function u(t) defined by (6.2). 
Since the Laplace transform is unique, u{t) is the solution of ( 1 . 1 ) by Theorem 2 . 
The series ( 6 . 2 ) converges absolutely for t > 0 since | A*e**‘ | < | A* |c' *. 
Finally (6.7) follows directly from ( 6 . 6 ). 

It is interesting to compare (6.7) with formulas (50) and ( 66 ) of Lotka’s 
paper [8]. Lotka considers the special case g{t) = f(t); in this case y(8k) = 

ip(8ic) — 1 , and (6.7) reduces to A* = — 7 ^ . If s* lies in the domain of con- 

<P («*) 
f“ 

vergence of the integral v(s) = / dt, that is, if 91(«t) > <r then 

Jo 

( 6 . 8 ) 1 = rc-*‘</(()dt, 

Ak Jo 

' in accoidance with Lotka’s result. However, ( 6 . 8 ) becomes meaningless for the 
roots with 91(s*) < <r, whereas (6.7) is applicable in all cases. 

Theorem 6 can easily be generalized to the case where the chwacteri8tic equa- 
tion ha8 multiple roote. The expansion ( 6 . 6 ) (which reduces to the customary 
expansion into partial fractions whenever w(8) is tneromorphic) is to be re- 
placed by 


(6.9) 


«(«) 




(1) 


l(2) 


+ 


"> v,’“ 

where m* is the multiplicity of the root s* 
expansion 

Ml] 1(1 ) . i(t) t 


+ ... + 


A("*) 


(« - s*)’“*J 

Thi.s leads us formally to an 


( 6 . 10 ) 


«(0 = E 

* 




+ Ai« 


1 ! 




'I 

(mt- 1)1/’ 


which now replaces (6.2). Generalizing Theorem 6 it is easy to formulate some 
simple C/Onditions under which (6.11) will really represent a solution of (1.1). 
Other conditions which ensure that (6.9) is the transform of (6.10) are known 
from the general theory of Laplace traasforms; such conditions usually use only 
function-theoretical properties of (6.9) and are applicable in particular when 
»(«) is meromorphic. We mention in particular a theorem of Churchill [17] 
which can be used for our purposes. 


7. Oa the prattical computation of the solution. There are at hand two main 
methods for the practical computation of the solution of (1.1). One of them 
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has been developed by Lotka and consists of an approximate eteputationbf a 
few coefficients ui the series (6.2). The other method uses an exfMahsioh.: if: 

(7.1) u(t) * 2 «»(0, 

n-id) 

where Un(0 represents the contribution of the nth “generation” and is deiAned 
by X 

(7.2) Unit) — g(l), Un+i(t) = w,(t — x)f{x) dx. 

Now the Laplace transform of u»+i(t) is y{a)<p'{a), and hence (7.2) corresponds 
to the expansion 

(7.3) «(s) - = y(a) t 

1 — fp{8) tiii^ 

In practice the functions g(i) and f(t) are usually not known exactly. Fre- 
quently their values are obtained from some statistical material, so that only 
their integrals over some time units, e.g. years, are actually known or, in other 
words, only the values 

(7.4) 

are given, where S > 0 is a given constant. Ordinarily in such cases some 
theoretical forms (e.g. Pearson curves) are fitted to the empirical data and 
equation (1.1) is solved with these theoretical functions. Now such a procedure 
is sometimes not only very troublesome, but also somewhat arbitrary. C!on> 
sider for example the limit of u(t) as < — > oo ; this asymptotic value is the main 
point of interest of the theory and all practical computations. However, as has 
been shown above, this limit depends only on the moments of the first two 
orders of f(t) and g(t), and, imless the fitting is done by the method of moments, 
the resulting value will depend on the special procedure of fitting. According^ 
it will sometimes happen that it is of advantage to use the empirical material 
as it is, and this can, at least in principle, always be done. 

If only the values (7.4) are used it is natural to consider /(<) and g(t) as step- 
functions defined by 

(7.5) ’ I for n5 < i < (n -I- 1)5. 

git) = 9», I 

In practice only a finite number among the and gn will be different from zepo: 
accordingly the Laplace transforms 7(8) and <p(,a) r^uce to trigonometrical poly- 

nomials, so that the analytic study of w(8) = - — becomes particulariy 

1 — <p{8) 

simple. Lotka’s meUiod can be applied directly in this case. 
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For a oonvemeat computation of (7.1) it ia better to return to the more general 
equation (1.3), instead of (1.1). The summatory functions F(0 and 0(£) should 
not be defined by (1.2) in this case, but simply by 

(7.6) F(t)=^U, 

n«»0 n*4) 

It is readily seen that the solution U(t) of (1.3) can be written in the form 
Uit) = 2 Unit), where 



Uoit) * Git), = f' Unit - x)dFix); 

Jo 

in our case Unit) will again be a step-function with jumps at the points kS, the 
corresponding saltus being 

k 

/> a.<*> — V 

Wo ^ ffk f Wn4-1 Wn /r • 

r-O 

Thus we arrive at exactly the same result as would have been obtained if the 
integrals (7.2) had been computed, starting from (7.4), by the ordinary methods 
for numerical integration of tabulated functions. It is of interest to note that 
this method of approximate evaluation of the integrals (7.2) leads to the exact 
values of the renewal function of a population where all changes occur in a dis- 
continuous way at the end of time intervals of length d in such a way that each 
change equals the mean value of the changes of the given population over the 
corresponding time interval. 
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ON THE JOINT DISTRIBUTION OF THE MEDIANS IN SAMPLES 
FROM A MULTIVARUTE POPULATION 

By a. M. Mood 
University of Texas 

It i8 well known [1] that in the case of a population having a single variate 
distributed according to a density function satisfying certain general conditions, 
the median of a sample is asymptotically normally distributed about the popula- 
tion median as a mean. It is the purpose of this paper to extend this result to 
populations involving more than one variate. Besides the theoretical interest 
of such a result, there may be some practical value in it when one is dealing with 
samples from a population for which the median is a more efficient statistic than 
the mean, as, for example, when the population variance is not finite. 

The complexity of the exact distribution of the sample median increases 
rapidly with the number of variates which describe the population; it is almost 
impossible to write out completely the distribution for the general case of k 
variates. For this reason the author has chosen to give first a detailed presenta- 
tion for the case of two variates, then use a condensed notation to establish the 
general result. This is a circuitous route, but it seems to be the only feasible one. 
A condensed notation is necessary for the general case, but presented alone it 
would be well-nigh incomprehensible. 


1. Distribution of the median in two dimensions. An extension of A. T. 
Craig's [2] geometrical argument will be used to obtain the exact distribution of 
tire sample median. Let us consider two variates xi and X 2 with density function 
f{xi , Xi) which shall satisfy the following conditions: 

1 . f{xu Xi) >0 

3. f f f(xt , Xt) dxi dxt = I 


4. Each of the equations 

f f /(xi, x») dxidxi - i 

X-OO 00 

/ f f(xi, Xt) dxidxa = i 

00 «^40 

has a unique real root. 
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If ii and ^ are the respective roots of the two equations this last condition 
then the point ((i , (t) is defined to be the population median. It will be assumed 
in what follows that the coordinate sjrstem has been so chosen tiiat » 0 « |i . 

Let a sample of 2n + 1 elements (lu , *»«)(<* = 1,2, • • • , 2 » + 1 ) be drawn 
from this population. The sample median (ii , it) will be defined as an elemmit 
(not necessarily in the sample) whose Xi coordinate is the middle, with respect 
to magnitude, number of the set of numbers xi. , and whose xt coordinate is the 
middle number of the set of numbers xja . Now let us compute the probability 
that the sample median will lie in the rectangle 

ii — \ dxi < Xi < ii -\r h d*.- i = 1, 2. 

This rectangle will be denoted by R". The remainder of the plane will be divided 
into eight other regions Ri, • • • , ^4 as indicated by the dotted lines in figure 1. 
The probability that an element will faU in the region will be denoted by 

dxidxj 








1 

. 


/?; 










Fig. 1 


Neglecting terms involving differentials of higher order we have 

J aOO 

I f(xi, Xt) dxtdxi 
*1 •'*1 

Ps = / f. fixi,Xt) dxtdxi 

J-CO Jxf 


( 1 ) 


p' = / /(*1, s*) dxidJ* 


P" - /(*!,**) 
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We shall consider now that the sample is drawn from a multinomial population 
with probabilities pi , • • • , p" and pick out those terms which give rise to a 
sample median in If the median is an element of the sample, then that 
element must fall in /2" and the other elements must fall in the regions Ri , 
Ri f Ri ) and Ra in such a manner that 

ni + Wa = ns + W4 = n 

ni + n4 = n2 + n8 = n 

or so that 

(2) ui = Hz and — Ua 


where n*- is the number of elements in Ri . The probability that this occurs is 


(3) 


z 

ni+n 2 »n 


+ l^\ 

niP naP 


p"pr^p?*p?‘pr 


Now suppose the median is determined by two different elements of the sample, 
for example one in R'l and one in R 2 , then there must be Ui elements in Ri , 
ni + 1 elements in Rz , and rh elements in each of R 2 and Ra with 


(4) 


m + n 2 = n — 1. 


The probability in this case is 


( 6 ) 


V (2n+l)i „nii„n,„ni+l„n. 

Jr..-, n,r <.n,+ w ^ ■ 


Continuing in this manner we obtain the distribution of the median, and letting 
D{ii , it) represent the density function giving this distribution we have 

D{ii, it) di.dit = V" 2 (piPa)"* (P2P4)"' 


( 6 ) 


+ (PjPiPs + VivWi) 2 
+ (P 2 P 1 P 4 + ViV'tv'i) 2 


(2w + 1)! 

n\\{n\ + l)In2p 

(2n + 1)1 
niPnal (n2+ 1)! 


(PlP»)”‘(P*P4)”’ 

(P1P8)"‘(P*P4)”‘. 


2. Asymptotic distribution of the median in two dimensions. As a simple 
notation 

A = B(H- 0(1/V^)) 

will be abbreviated to read 

(7) A=.B, 

the dot after the equality sign indicating the omission of the factor 1 + 0(l/\/n). 
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Ab is costomaiy, the second term of this factor represents any function such tiiat 

lim iVOd/AT) = I, < «. 

In order to get an approximation to (6) for large n we shall use the normal 
approximation for the multinomial distribution and compute the sums (these 
cannot be put in finite form) by integration. We use then the well-known result 

(8) n pr - • W/(2x)'-*]‘ exp (- i E AiiZiz) H dzi, 

1 

where 

(9) Zi = (mi - mpi)l\/ m, z = 1, 2, • • • , r - 1 , 

(10) A« = l + i, = 

Vi Vr Vr 

Returning to (6) it is to be noted that the fraction immediately following S 
in the first sum has one more factor in the denominator than the corresponding 
fractions in the other sums. This first sum may therefore be neglected in the 
asymptotic form as it is of order l/n in comparison with the others. We con- 
sider now the second sum in (6) and let it be represented by the letter S 


(2TI l) 

Vi pi Pz Pi • 


(11) S = 2n(2n + Dpjpa E „ xnt « n ' 

Employing (8) and omitting certain terms of order l/n we have 

(12) S = • 4n®piP« £ M/(2ir)’]* exp i E AijZiZ^ dzi dzz dzt, 
in which the A <, are defined by (10) with r — i, and 

(13) Zi — {rii — ’Lwp^ly/^ , i = 1, 2, 3. 

In view of the relations (2) between the n< we h&ve 

z% = (i - Pi - Pj) - Zi - Ui- Zi 

(14) _ 

z» = (Pi - P») — *1 = w* - *1 , 


in which relations we have defined the new symbols Ui and ih . It will be recalled 
that in (8) the factors dZi correspond to factors ll-y/m, we therefore let dzt and 
dzt in (12) cancel a factor 2n from the coefficient of the exponential, and after 
substituting (14) in (12) find that 


( 16 ) 


5 * • 2np[pi S U/(2»)*l* + 


\ P4 p* p»/ 


+ 


(«1 + U»)* 

P4 


+ 
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The summation can now be performed to within terms of order l/\/n by inte- 
gration with respect to zi between the limits — « and + « ; this gives us 


(16) 




+ - + -Y 

pi Pi 




(ui + Ui)* 

V< 


^ + i + 1 + 1)]}. 

Pi \ Pa Pi Pi/ / \pi P2 Pi Pi/JJ 


At this point some new S 3 rmbols are required. We let Qi and Qi represent the 
results of replacing f i and Ja by zero in the integrals of the relations (1) 



~ i i ^ 

q'l = f /(*!, 0) dxi 

Jo 


g* = f f /(*!, Xa) dxidxa 

g» = / /(O, Xi) dxt 

(17) 

00 »0 

itO ^0 

J0 


«» “ / / f(xi,Xt) dxidzt 

•Loo •Lao 

g» = 9) dxi 


^ [ L /(®‘ > 

f® 

g« = ^ 

then 



(18) 

gj + g» = g» + g« “ gi + g« = ft + g* = J 

and 



(19) 

ft = g» » 

qt ’= qt- 

Also we let 


(20) 

oi * gj + gj , 

ft * gi + gi , 

(21) 

Vi = oi*i > 

y» “ ft*» • 

We have now 



Pi <=• qt, 

i =* 1, 2, 3, 4, 

(22) 

II 

t' “ 1, 3, 


p'i * . Qid&i , 

* = 2, 4. 

Also 
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Similarly . 
(24) 


- V'2n ‘*®‘ ’ 

3^ f . 0) dxi 


Va»‘ 

\/ 2fi dsfs 

“• y*. 

t<s >= \/^ (Pl - P») 
* • — (yi + y*). 


1 , 


The result of substituting (22), (23) and (24) in (16) with some further simplifica- 
tion using (18) and (19) is 


(26) 


o 2nq[<it 

S == • — exp 
2ir^/ qiq2 


(4 


y! - 4(gi - qi)yiyt + yl 
4gig* 


dii d$s . 


The other three sums of (6) will give rise to the same expression except that the 
factors gtqi wiil be different; it is clear then that 

D(i., ft) =■ M 

2ry/qiq2 


X (-i ~ 7 + '^ ) jftiift 

\ 2 4qiqi ) 


(26) 

(27) 


2naia3 


2ry/qi 
1 

27r\/qiqt 


(1% { (Zi 

= exp I — n — 
Mi ' 


- 4(gi - qiiyiVt 4- Vi ) 

4gigs 

o* $i — 4(yi — 9a)ui oj 5* *1* o| 


exp 


45.5. 


This is the asymptotic form for the distributiomof the median in two dimensions. 


3. Distribution of the median in k dimensions. We consider now a population 
characterized by a density function /(xi , • • • , x*) defined over a euclidean space 
of k dimensions satisfying conditions like those required of /(xi , Xt) in section 1, 
and we assume that the population median is at the origin so that the integral 
of the density function over any half-space determined by a coordinate h 3 q>er- 
plane is 

A sample of 2n -H 1 elements will have a median ($i, , St,) each coordinate 

of which is the middle number of the set of numbers giving the corresponding 
coordinate of the elements of the sample. To obtain the probability that the 
sample median lies in the hyperparallopiped £„ — < Xa < £. + ^ dSa 

(a = 1,2, ••• , k), we divide the space into 3* regions by means of hyperplanes 
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perpendicular to the coordmate axes through the points Ja ± i on the co- 
ordinate axes. These regions are illustrated in Figure 2 for the case of three 
dimensions. The coordinate axes have been omitted in this figure. There 
will be 2* primary regions denoted by iZi ,/?»,•••, i?»* corresponding to the 
octants of the figure; fc2*~‘ regions with one differential dimension denoted by 

R'i , Ri , • ■ • , R)a’‘-i corresponding to the quarter slabs of the figure; 2*-* 

regions with two differential dimensions corresponding to the half strips of the 
figure, and so forth. Probabilities associated with these regions are defined by 

Pi^ = f(xi , . • • , X*) da:i • • • da:* . 



If the sample median is determined by k different elements of the sample there 
will be one of these k elements in each of k regions Ri whose differential dimen- 
sions are mutually orthogonal and the other elements of the sample will fall in 
the regions Ri in such a way that n elements of the sample will lie on either 
side of any of the k hyperplanes . The probability of this occurrence 

for a particular choice of k of the regions Ri is 


(28) 


5 = np;.z 


2 * 




n PV 

11 Tlil •-1 


in which the 2* indices n< are subject to k independent restrictions of the type 


(29) 


2'ni n — Ca , 
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where c. is an integer such that 0 < c. < A;, and the prime on X indioBtes that 
the sum is to be taken over all n< on one side of a hyperplane Za ^ Sa • n< is 
the number of elements in Rt and besides the k conditions (29) we have also 

t* 

(30) £ n< = 2n — fc + 1. 

1 

In order to include all ways in which the median is determined by k different 
elements of the sample we must add together 2*’**^" sums of the t 3 rpe (28). If 
the median is determined by less than k elements, say k ~ h elements, then the 
fraction (2n + l)!/nn<! will have h extra factors in the denominator and hence 
the sum will be of order 1/n* as compared with that of (28) and may be neglected 
in obtaining an asymptotic expression. 

Thus we need only find the limiting form of (28) 

^ = (2n + l)(2n) . . . (2n - A: + 2) H Pi Z ft P?‘, 

which after substituting (8) and neglecting terms of lower order becomes 

(31) S = . (2n)‘ n Pi. Z (4/(2*-)**“*)* exp (-§ 2 11 ds*, 

1 

in which the Atj are defined by (10) with r =■ 2* and 

(32) Zi = (n, — 2np,)/\/^> i = 1, 2, • • • , 2* — 1. 
Now we define 

(33) Ua = y/Xhih — ^'Pi), a “ 1, 2, . . • , A, 

the Z' having the same significance as in (29). These conditions (29) may now 
be put in the form 


Z. — *11. Z/a(z), 

in which La(z) is a sum of a certain subset of the variables zi^-i , • • • , Zt^-i . 
Care must be taken in labeling the regions Ri in order to be able to solve for 
2 i , • • • , z* in this form. After substituting these relations in (31) we replace 

k 

H dza by (l/2n)*^* and perform the summation to within terms of order l/Vn 
1 

by integrating the remaining from — <» to + o© ; the result is 

(34) 8 - . (2n/2T)*'* H Pi, V® exp 2 B, 0 U,u^, 

in which the Bat are functions of the p< , and B » | Baf |. As in (17) and (20) 
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we define 

“ / /(®1, •• • f **)ndx« 

•'it 

(35) g,- = ( /(*i , • • • , ir*)n' dx, 

•'Si 

flat * f t * * * t dXa s* S g< , 

in which Ri is the set of regions bounded by the coordinate hyperplanes Ri 
are regions into which the coordinate h 3 T>erplanes are divided by the remaining 
coordinate hyperplanes. W indicates that one of the differentials is omitted 
and the variate corresponding to that differential is put equal to zero in 
/(xi , • • • , x») ; 2' indicates the sum over all q' determined by regions lying in 
the hyperplane x« = 0. It is clear that 


(36) 


where 


tt a 

* 

K-a = ••y/2w ) 

/J-1 


dafi = =tl or 0, and yp == \/ 2 nafi 0 . 

Making these substitutions in (34) we have 

(37) S - . (2n/2jr)*« H «*'. VC exp (-n Z Ca 0 a„aM^ II dia , 

and adding together all possible sums of the type (28) we have the as 3 rmptotiG 
form of the distribution of the sample median 

(38) 2)(*1, ... 

* ✓ * 
aay/C exp I — n 2 

(39) = . (l/2ir)*'*VC exp ( - i 2 C-oP-I/ij) II dy„ , 

in which the Ca$ are functions of the g< . 


= .(2n/2r)*'*n 
1 


Caffhflf&Jtfi ) II d&a 


4 . The case of three dimensions. The computation of the coefScients 
of (39) requires the evaluation of a determinant of order 2* — A: for each one of 
them. This work was quite laborious even for k = 3 and the author made 
no attempt to find their explicit expression for larger values of k. 

If we let a subscript + indicate integration of the density function 
/(xi , X* , x») from 0 to «, and a subscript — vindicate integration from — » to 0, 
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as for exampki 

rrc f(xi, Xt, a^dxtdxtdxt, 

then the of (36) will be defined ae follows 


/+++ 

9i “/-++ 

— 

93 * I"" 

/+-4- 

9s »/— f 

/4- ' 

9* “/ — 


The coefficients Cafi may be written 

DCn = 2(gi + qt){qi + g#) 
DCta = 2(gi + qt){Qt + qt) 
DCn “ 2(gi + qt)iqt + 84) 


(41) 

DCu = gs^'fi + QiQt — 

8197 — 

m 

DCis = gjgs + 9497 

8i96 - 

9393 

where 

DCss = Qiqi + gegr — 

9i93 - 

9393 


D — 818*9384 f— + — + — + — ^ + qtqtQ 7 q»(~ + — + — + — ^ 

\3i 8s 9* W V9» 9* 9s W 

+ 2(93 + qt){qi + q»){qiqt + 9394) 

(42) 

+ 2(93 + 87)(8* + 9*)(9i93 + 8 * 94 ) 
+ 2(83 + 9s)(8e + 9r)(8i94 + 8 * 93 ) 
+ 8(8i848e8r + 9*9»9»9*) 

(41) and (42) can of course be put in different forms by using the four relations 
,bf tween the 8< . The o« of (38) are defined in (36) ; for A: »» 3 they are 

Cl = ^ £ /(O, x», xi)dxtdxt 
ot- f f f(xi, 0 ,xt)dxidx$ 

«L.«q «L.«o 

as « cl: f(xi, Xt, 0) dxidxt. 


( 43 ) 
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6. The nomud distribution in two dimensions. If the density function of 
the second section of the paper is normal 

(44) M . 1/(2... WW) exp [- (^ - ^ + ^j)] , 

we find that the parameters of (26) are 

1 1 . _j 

p. 

1 

Oj ^ 

VZirffj 

These give an interesting result — the correlation coefficient of the asymptotic 
distribution of the sample medians is 

(46) Pn- - sin V 

T 

hence 

(47) |p«|<|p| 

the equality sign holding only when p = 0 ot ±1. 
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SAMPLES FROM TWO BIVARIATE NORMAL POPULATIONS' 

By Chung Tsi Hsu 
Columbia University 

1. Introduction. In multivariate analysis involving p variates, or in analysis 
of variance of m samples from univariate populations, we are often interested 
in the hypothesis of the equality of variances; viz., that 

<ri = <72 = • • • = (Tp , in the case of p variates; 
or 

cTi = <r 2 = • • • = (Tm , in the case of m samples. 

As a matter of fact, it seldom occurs that these hypotheses are true, but the 
ratio between the variances might be known. 

Hotelling [5] has suggested that if 

ol/ki = erX/kt 

where the fc's are known constants, we can apply the transformation 

x[ = WiXi , 

Xt = W%X2 , 

^ U)fnXin , 

where 

W\/Fi “ VJtVh = . . . = WmVK - 1 , 

SO that after transformation the variances become equal, i.e., 

/ / f 

(Tl = (Tj = • • • = (Tm > 

and the required analysis can be carried out. This method is similarly ap- 
plicable in the multivariate case. 

In a previous paper [7], I developed a series of hypotheses concerning samples 
from a bivariate normal population under the assumption that 

ITi = 0*2 . 

In case al/ki = (rj/fca , where ki and k^ are two distinct known constants, 
similar results may be obtained by the use of the transformation xt = WiXi ; 
xi = ; where Wi\/% *= W 2 y/h “ !• 


^ Presented to the American Mathematical Society at Washington, D. C., May 3, 1941. 
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In multivariate analysis, the hypotheses usually of interest concerning correla- 
tion coefficients may be classified in two categories, viz., 

(i) that file correlation coefficient is equal to a specified value, e.g., in 
simple correlation pu = po , partial correlation, pu.i po , in multiple 
correlation, pi.u = po , or in correlation between two sets of variates 
[4]*, Q == Qo ; of special interest is the hjrpothesis of the vanishing of 
such correlation coefficients. 

(ii) that two given correlation coefficients are equal, e.g., (1) correlation 
coefficients pi and ps in the correlation matrix of a multivariate distribu- 
tion are equal (Hotelling [6]), or (2) the correlation coefficients pu and 
Pis in two bivariate populations are equal. 

R. A. Fisher in his earlier paper [3] introduced the transformation z = 

1 1 -1- r 

- log which provides a very satisfactory, though approximate, method for 

2 1 — r 

the comparison of two correlation coeflSicients. Brander [1] treated the same 
problem by the method of the likelihood ratio criterion. 

The present paper is an attempt to obtain different criteria by the likelihood 
ratio method (Neyman and Pearson [9], [10], [11]) for testing, by means of 
samples, the equality of correlation coefficients in two bivariate normal popula- 
tions under the following sets of conditions: (1) ai = as and ai = ai ; (2) cri = (ts , 

« {2 and <r( = <r 2 , = fa . The results mav be extended to the cases (3) 

ffiA. = orl/ksandaiy/ci = ; (4) <r!/k, = al/ks , = (j/ks and a'l^/ki = 

<rs^/ks , = (s^/ki , where the fc’s are known constants. 


2. The hypotheses. Two samples, each being of two variates (xi , x*) and 
(xi , Xj), of size N and JV', are supposed to be drawn at random, respectively, 
from two independent normal bivariate populations, with the following distri- 
butions: 


( 1 ) 


( 2 ) 




27r<rior2-\/l — p* 




where , «ri , »* , p; , vj , p' are the unknown parameters of the 

populations. 

The hypotheses to be considered in the present paper are: 

Hi : Assuming n •= <r» and ci = <rt , to test p =■ p'. 

Ht : Assuming ffi =* <r* , , and , to test p p'. 


* See bibliography at the end of the paper. 
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The derivation and the distribution of the criteria for testing tiiese hypotheses 
may be simplified by the following simultaneous transformations: 


(3) 

X - 

V2 

Y *» (asi 4- ®a) 

V 2 

(4) 

X' ^ - xi) 

V2 

Y' = (*1 + x't) 

V2 


The corresponding normal bivariate distributions in the transformed variables 
(X, Y) and (X', 7') are obtained, viz. 


(5) 




2ir<rx 1 pxy 


( 6 ) 




The conditions corresponding to 

(7) ffi *= V* and == v* , 
are that 

(8) pxr = 0 and pxr “ 0. 
Also, for a given p and p', we have from (7) 

(9) (Ty = yc'x and <r'r — y'vx, 
where 


(10) y ■» and y’ = 

1 - P 1 - P 

Following the notation of (9) and (10), the hypotheses H\ and Ht corresponding 
to and Ha are: 

Hi : Assuming pxr — 0, and p'xr == 0, to test y *= y'. 

Hi : Assuming pxr «= 0, { = 0, and pxr = 0,{' = 0, to test y * y’. 


3. The derivation of the critnia. Let {xu , Xii){x[j , Xa /) be the measurements 
of the characters on the fth and yth individuals in the two samples from their 
respective populations. After transformation, the corresponding measurements 
become (X{ , Y{) and (X/ , Yj). Let p{E) denote the joint elemeatary proba- 
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bility law of the N and N' observations, E = (Xi , • • • , Xh , , • • • , Fjir > 

Following Ne 3 anan and Pearson, we shall use Q to designate the class of ad- 
missible populations under conditions which can be assumed to be satisfied in 
any case; and u to designate a subclass of U imder conditions which are satisfied 
only if the hypothesis to be tested is true. 

Thus for H', Q specifies for pxr — Pxr = 0, any real values of f, i>, ij' and 
any positive values of «ri , , itx , ; w specifies pxr — Pxr == 0, any real 
values of f, n, V and any positive values of ar and y which are defined by (9). 
While for H', it specifies pxr = Pxr = 0, f = f' = 0, any real values of ij and i?' 
and any positive values of trx ,ffr yv'x , vr ; a specifies pzi” = Pxr = 0, f = {' = 0, 
any real values of i; and ri', and any positive values of <rr and y which are defined 
by (9). 

For our hypothesis H'l , the values of the parameters required to make p(Q) 
a maximum are: 


^ = X, ^ = F, ffx — Sx t 

= »x = Sx, 


Thus p(fi max) 


1 1 


-S-N' 


dr “ «r 

o' J 
ffr ~ *r • 


To obtain p(w max), let us define, according to the notation in the writer’s 
previous paper [7], 


« 2Fsis* 

* ■ .r+n 


and 




2Y'8'X 

't I 't 

8i + 8i 


8* _ 1 4* _ 8y* _ 1 + fil 

8*x 1 - 8^* 1 - f?r 


Then the values making p(<a) a maximum are: 

I * X, ^ (Ty = + w) 

I' = X', ’ r = <r;* = i8i*(^ + u') 

and is the positive root of the equation 

(N + N'h* - (N - W’)(tt - u')y - (N + N')uu' - 0 


or 

^ _ (AT - N'){u - u') + V(W - N')\u - «')* + 4(iV + N')uu' 
^ “ 2(Ar + JVO 


(11) 


yi. say- 
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Thai 


(ins^&TL^i 

and the likelihood ratio criterion for the hypothesis Hi is 

r 2v^sr r r 2v/^s; i 

L(7i + u)8xj L(7 i + u')ix\ 




(12) 


p(ft> max) 
p(n max) 


LYi + wJ Lti + m'J 


For H'i , the values the parameters to make p(w) a maximum are: 




Thus 


p(fi max) 


/ 1 Vnn' 

\2ir/ (SX*)'^«(SX'*)^'«4«;^' 


Similarly, if we write 

„ _ 27S1S2 — i(^i — ^s)* rf> _ 2y'8'i8i — i(*i — ^*)* 

8? + si + i(^i - **)” + + 

and 

_ N8\ _ sl _ I + Rt , _ JVsk* _ 1+ 

4 + 5* 1 - Ri’ ^ ““ 2X'* 1 _ ft” 

the values to make p(w) a maximum are: 

= 4 = ^sr(<p + i;) 

r=?', V? = 2X'*(^ + 8) 

* _ (AT - AT'Xr - vO + V(iV - N')\v - v'Y + 4(Ar + N')"uu' 

(13) 2(iV + N>) 

* yi, say. 


Then 


p(w max) = 


/ly+^T 2NW. IT T' 

\2Ty L(yi + P)2X*J L(7» + ’ 
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and the likelihood ratio criterion for the hypotheais Ht is 
p{u> max) 


X* 


(14) 


_ r IT 

p(0 max) L(7 i + v) V^*J L(7» + 


2^rTT2;^'7' 

7* + »J Lyi+p'J • 


The case N = N'. The above criteria Xi and Xt cannot in general be expressed 
simply, but when N •* N', by (11) and (13) 

71 ■■ V uu', 7 » “ Vw', 

and 

. _ r 4 vw r > _ r T 
L(v^ + Vi7)*J ’ + W)*J ’ 

or we may express as monotonic functions of Xi and Xs , 


(16) 

Li 

(16) 

U 


u' y «y 


xi"'' 




ti V 

Thus, X’s, L's, or their functions ~/f be used as the criteria in the 
present case. 

Furthermore, if we introduce, 


(17) 

we have 


z = i log w, and z' | log u', 
i(z -zO * Jlog J or 




Thus Li can be written in terms of z and z' 

(18) Li = 4/(€*‘*^'’ + e"*‘*"''’) = l/cosh* i(z ~ z') - sech' hiz - z'), 

and z — z' — w, say, may be used also as a criterion for Hi . 

We shall now proceed to obtain the distributions of some of these statistics. 

4. The distributions of u/u' and v/v\ Since iV«*/<r* and N8%/<rx have inde- 
pendently the X distribution with N — 1 degrees of freedom, 


u 


J S3 3 

_ VrX3 yxt 

«> 3 3 3 

Sx VxXi Xi 


and u/y has the F distribution with degrees of freedom /i = i\r— l,/i = W — 1. 
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Similarly, M'/y' «*= x»Vxx* luw the ^ distribution with the same numbers of 
degrees of freedom (since iV » AT', in the present case). 

If the hypothesis H[ is true (i.e., y — y') 


(19) 


u T&Xi ^ 0\9t «i 


where 9<(~ix<) or is distributed as 


(») 

with o< = \{N — 1), and ai(= 9i0t), a*(“ ®itf») follow independently the Wilks' 
z-distribution, [14], which we shall study in detail for the present case. 
Distribution of z when p = 2: Consider 

z * B0i$i • • • Op , 

Wilks has succeeded in integrating the distribution of z for the case p 2 for 
special values of a’s, e.g., ai = \(N — 1), o» = i(JV — 2). Now we want the 
distribution of z when p = 2 and for any values of a, and then for Oi oi » 
UN - 1). 

By (20) the joint distribution of 0i and Oi is 


V dOjdO,. 

r(oi)r(a») 

Applying the transformation z — BOiOi ,vi = 6i, the joint distribution of vi , s is 


i_ ('-1 Y’“ 

,)r(a,) \Bvt) 




dvidz 


r(ai)r(a,) ■'* " \BviJ " Bvi 

Integrating vi from vi = 0 to vj » w , we have the distribution of e, vis., 


( 21 ) 


Oj-l 


dz 


Jo 




dvi. 


i5«*r(ai)r(a2) Jo 

In order to evaluate the integral of (20), consider the transformation » y\ 
dvi = 2y dy, we have 


( 22 ) 


r 


y 




dy. 


To evaluate 7o for any o’s, by putting y «= 1/*, dy “ — d*/**, we have 


Consider 

( 24 ) 


r(ai - a* + i) _ /*■ , 

Jt ^ 
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Then 


loTiai - 0, + i) - 2 J e“*’‘'y““‘*"*dy 

>d,l 




g-IG/*+K)»*+W««) ^ 


H 


z/B + y 


Since by the substitution B ^ ~ + y or y = ®* + 2 dV 

2^x + therefore 

hT{ax - a, + i) = 2Vr f + 2x yj/l)"*""’"* 


(26) 


r(ai — 02 + i) 
Hence, z is distributed as 


(26) 


2\/7rZ 


,02--i^-2VJ75 


5‘‘»r(ai)r(a2)r(oi ~ 02 + ^) 




We infer from this distribution that when 2(ai — o»), i.e., the difference of 
degrees of freedom, is odd, the integral can be expressed as a terminated series; 
but for even values of 2(oi — o*), the series is infinite. 

When B = ^ , Oi = i(N — 1), oa = UN — 2), (26) is reduced to 

Jx. 

(27) 

r(oi)r(as) ’ 

which is Wilks’ ( distribution, [15], for p = 2. 

When B = 1 and oi = o* = i(N — 1), it becomes 

(28) r e-^(2V^ + x)-^~^dx, 
r(oi)r(aa) Jo ^ 

which is the distribution of z involved in (19). 

Since (28) can apparently not be simplified, I have been unable thus far to 
find in manageable form the distribution of the ratio zi/zt and therefore of u/u' 
in this case. However, it would be simpler to use the alternative criterion 
w = z — z' tor the hypothesis Hi . The distribution of w will be taken up in a 
later section. 
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The distribuMon of v/v': Since Na^/tr^ and XX*lci have independency Cie x* 
distribution with — 1 and N degrees of freedom respectively, therefore, 

■ N^r 
1X» 


_ iVftr _ g* xi _ 7x1 
» « ~ 1 > 
VxXi Xi 


and 


V /N-1 
y/ N 


y 

■■ N. 


has the F-distribution with/i <= iV — 1 degrees of freedom and 


v' /N — I 

Similarly j has the F-distribution with degrees of freedom /i and/t 


as above. 

If the hypothesis is true (i.e., y = y*'), 

2 '2 ts, 

^ _ X 2 XI _ ^1^2 

St' 2 '2 ^ 

^ X1X2 ^ 1^2 


f 


where each is distributed as in (19), but with ai = iN and = i(N — 1). 
We can infer from (27) that ti = 4\/^ and k = 4\/^ have independently 
the x^-distribution each with 4(i* or 2(N — 1) degrees of freedom, and ti/ti *= 
\/ Zi/z 2 = \/ viv' follows the F-distribution with degrees of freedom /i = /2 = 
2(N — 1). The 5% and 1% points of the F = v/v^ may be obtained from 
Snedecor's table ([12], p. 174). 


5. The distribution of y — log z. Wald [13] has suggested that the distribu- 
tion of 2 = 13^1^2 • • • 6p for any ats (t = 1, • • • , p) may also be obtained in- 
directly with the aid of the characteristic function. A similar method has been 
applied in a recent paper by Wald and Brookner [14], Consider the trans- 
formation 


(29) 


y = log < = log • • • Op. 


The characteristic function of y is 

Vy(tl= FCe") = EiiBeidi .-. O') 

(30) ^ g‘r(ai + t)T{a2 + 0 • • • r(fflp + t) 

r(ai)r(o*) • . • r(op) 

Thus the distribution f(y) dy is given by 


(31) 


~^iL" ■“ = r' rt 


»-l 


r(a,) 


dt. 


2vt X-ioo 

Without loss of generality, we may take Oi ^ oj ^ ^ Op > 0 and let 

Op + < = —t', then 


(32) f(y) = f r(o< - Op - t') dt', 

Im d-op-im t-1 

where c, » j r(o,). 
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The integration can be carried out by the method of residue along the contour 
C, bounded by the line x ^ —Op and that part of the circle with center at 
origin and radius r, which lies to the right of the line x «= —Op. The integral 
of the function e*'*B~*' Hf-i r(o< — o, — 0 along the arc converges to sero 
as the radius of the circle tends to infinity (KuUback, [8]). Hence the integrals 
along the vertical line x + a, = 0 and along the closed contour C are equal. 
Then we may write 

(33) f{y) - f fl r(c -0,-0 df', 

ZTl Jc 1 

and its value is c, times the sum of the residues at the poles within the con- 
tour C. 

For the present purpose, p = 2, we have 

(34) f(y) = ^. / e‘'‘'r(oi - o, - 0r(-0<ft'. 

We shall study the integral of (34) in more detail in the following cases: 

(i) Oi — Oj = J. By the duplication formula 

r(J - 0r(-0 = 2*‘^‘'vvr(-2<'), 


and the function 


r(-20 


lim 

W-»io 


(-2<')(-2«' + 1) {-2t' + Ny 


has simple poles at the points 0, 1, 3/2, .... The residue at t' = m/2, 

where m is zero or a positive integer, is (— l)**'*^/2.m! and (34) becomes 

(36) + •••) 

The distribution of z » e*' is 


(27 bis) 


r(oi)r(a.) 


(ii) Oi — Of •“ m + i. The function 


r(0i - 0, - or(-0 - (m-i- 0 (m-|- 0 --- (§ - Or(J - 0 r (-0 
- 2‘+*‘ V5?(»» - J - tr)im -. J - 0 • • • (i - Or (-20 
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has simple polra at 0, m, m + }, m + 1, • • • , and 


f(v) = 

- Vircij^, 


1 


(2m — 1)1 

2^(m -!)!“■ 2r(2m) 


(2i‘eT + 




2*(2m + 1) 


2-(2m + 2) 


•] 


(2m - 1)1 
2»— »(m -1)1 2" (2m+^yl 


--t 

Am 


I (a**')’ 


,»»+r/»J ^ 


This agrees with the expansion of (26) when we put Oi — oi — ^ m. 
(iii) oj -• ot « 0. The function 


[r(-<')r = lim 


m)*N-**' 


(-0*(-<' + 1)* • • • (-<' 4- JV)*' 
has poles of the second order at the points 0, 1, 2, 3, • • • and 

fiv) “ {«' - r)*e‘'''[r(-0]*}r'-. 

(iv) ai — 02 = m. The function 

r(m - Or(-<') = (m - 1 - t')(m - 2 - t*) (1 - tO(-Oir(-01* 

I 

has finite simple poles at 1, 2, • • • , m — 1 and poles of the second order at m, 
m + !,•••, and 

m = c, E {« - 7)e''''r(m - <0r(-«0}i'^ 


7-0 


+ Ct E (4 («' - 7 )*e‘'‘'r(m - t')T{-t')\ . 

7— m ^ 


6. The distribution oiw = t — at ^ — cosh w. Since the distribution of u 
is given in [7] as 


(39) 


yBlkiN 




therefore, by transformation (17), we have that the distribution of z for a given 
f == i log 7 = i log is 

1 — p 




sech" (2 — {*) dz, 


B 


(40) 



290 


OHimO W BBTJ 


where n N — 1. The distribution of t has been given by R. A. Fisher [3] 
forn n 1 and by Deluiy [2]. Similarly, the distribution of s' for a given f' is 


(41) 


B 


^jWsech-'Cs'-r')*', 

\2’ 2 ; 


where n' = N' — 1. 

In case n = n', the joint distribution of z and 2 ' for a given common f is 

(42) CBech- (, - e) secb- (»- - 04. 

where l/C - 

By the transformation 2 = i(2 + 2')> w — z — z', we have the joint distri- 
bution of 2 and w, 

(43) 


C didw 


2”Cd2dir 


[cosh* (z — f) cosh" (s' — f)] [cosh 2(2 — f ) + cosh u)]"’ 

Integrating with respect to 2 from — « to « , we have 

d2 


2"Cdw 


£ 


(44) 


[cosh 2(2 — f ) + cosh «>]" 

= 2"CdM) f 


2dz 


Jo [cosh 2(2 — f) + cosh to]" 
= 2"Cdto7n, say. 

Applying the transformation ^ = 2(2 — f), ^ = cosh to, the integral of (34) 
becomes 

^ 

Jo (cosh 0 + 1 ^)" 

1 j- ^ 

Substituting cosh 4> + ^ — - — , we have 


Jo Vi + ^/ e 


did 


(46) 


/(■-M 


(l^ + 




did. 


Ck>mparing (35) with the hypergeometric fimction 
(46) J - - dxrdB = ~ F(o, 6, c, x), 
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we have & » n, e — a = and therefore (36) can be exprea^ in terms 

of a hypergeometric series as 

r _ r(n)r(i) 1 r./i _ „ ^ 1 ^ - 1\ 

'• ■ r(J+T) » + «■ '^V’ iFTi)- 


^ — I 

The series (37) is convergent since ^ ^ is less than unity. Thus the distri- 


bution of w, from (34), is 


2"Cr(n)r(J) 1 „/, 1 coshw 

r(n + i) (coshw+D- V’”’”‘^*’co8htD + 


r(» + i) (cosh w + 1)" 
and the distribution of ^ cosh w is 

2"+‘cr(n)r(i) 1 


i)i», 


- 1 )' '■(*’ hrl) * 


r(»+}) » + V” 

We notice that the distribution of i/ expressed in (39) is very similar to the 
r-distribution expressed in terms of hypergeometric series, except that in the 

first case the argument is r - r ^ i while in the second case it is ; — r—^ where 

^ + 1 1 + p 

p = pr. Hotelling [5] has obtained a very rapidly convergent hypergeometric 

series for the distribution of the correlation coefficient since | p | < 1. But 

for the distribution of we cannot obtain a more rapidly convergent series than 

(39), since the values of ^ lie between 1 and «. 


7. Summary and remark. Two hypotheses concerning the comparison of 
correlation coefficients of two samples from bivariate normal populations have 
been considered. The appropriate test criteria for each hypothesis have been 
derived by the use of a transformation of the variates. The distributions of 
certain of the criteria have been obtained in the special case where N == N\ 
Incidentally the distribution of Wilks’ z for p ^ 2 and any values of ax and Os 
has been derived. 

Again though we assume throughout the paper that <ri — at and ai ^ at y the 
tests can be generalized to fit the case where the ratios ai/at = fc, a[/a 2 = k' 
are known, but are different from unity. In the latter case we can apply the 
transformation 

yi * vhxi , yt =* vhxt ; 

/ ft / ft 

yx = wxXx , yt - wtxt ; 

where 

•wiki == wjct * 1, vhk'i = vikt = 1, 

so that after transformation the variances of each pair of y’s are equal. 

The writer is deeply indebted to Professor Harold Hotelling and Dr. Abraham 
Wald for their advice and suggestions in the preparation of this paper. 
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ON RANDOMNESS IN ORDERED SEQUENCES 

By L. C* Young 

Weatinghouae Electric and Manufacturing Company 

It is frequently desirable to examine an otdered sequence of measurements 
for the presence of non-random variability, concern over any particular type of 
variability being limited. Unless the sequence is one containing replicated 
observations, current methods of analysis often restrict an investigation to 
tests for specific forms of variability, such as particular orders of regression and 
periodicity. In order to simulate replication, arbitrary grouping of data is 
occasionally used and followed by some test of variance; this practice, however, 
is likely to add an element of bias to the investigation. 

Under these conditions, it would be convenient to have the means of testing a 
series for the presence of general regression, before proceeding to test for that of 
a specific type. It is the purpose of this paper to present, as briefly as possible, 
a statistic designed for this preliminary type of examination, and to demonstrate 
its application. 

If a given sequence of measurements be denoted by 

, X, , . . . , Zn 

then the magnitude of 

E {Xi - x^iY 

2 E 

1 

will be dependent upon the arrangement of th§ n observations upon which it is 
based. C will have n! possible values for a given sample, corresponding to the 
number of permutations of n items. 

1. Moments of the distribution of C in terms of the moments of a 
finite sequence. Writing C in terms xi , • • • ,Xn, representing the devia- 
tions of Xi , • • • , Zn from their sample mean of n measurements, 


E (xt - Xi+i)* 



1 


iPx + + 2 X) ^^+1 

2±X^, 

I 
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In order to find the mean value of C for a given sample, it must be summed 
over all values obtained from the n! permutations of the measurements. 
Dealing with the numerator alone of the expression given above: 

+ icl + 2 2 = ^pXl + ^pXn + 2 XiXi^if 

where Sp denotes summation over the n! permutations. 

There are n values of x,-, and nl arrangements. Each value x< is xi in 
(n — 1)! of the arrangements: the same reasoning applies to Xn . The first two 
terms of the summation, therefore, will be 

= (n - 1)1 SarJ. 

1 

With regard to the third term, there are 2(n — 1) of such cross-products for 
each arrangement. Since the summation is taken over n! arrangements, XjXk 
will be different than XkXj ] and should be considered a separate term. Each 

crossproduct term, therefore, must occur times throughout the nl 

n(n — 1) 

arrangements, since there arc n(n — 1) possible cross-products among n different 
items. The third term, then, will be 

2 Up ( ]C XiXi^i^ = 2(n - 1)! X) ^ x,Xk = - 2(n - 1)! x* , 

from which it may be seen that the mean value of C is zero for any sample. 

The same method may be applied in order to find the second and higher 
moments of C. Squaring the numerator of the expression and expanding, 

+ x* + 2 2 a:.a:<+ij 

t n-1 n-1 /n-1 \2"| 

*!+«!.+ 2xlxi + 4a:? ^ + 4x* £ a;.a:i+i + 4f ^ J. 

Performing the summation S, term by term we obtain 

£p l^x? + X* + 2 2 XiXi+iJ 2(2« — -2n'^x\ 

n! n(n — 1) 

whence the second moment of C for any sample is given by 


Mi 


2n — 3 — mi/m\ 
2n(n — 1) 


where m* and are the second and fourth moments, respectively, of the n 
observations about their mean. 

In like manner, the third and fourth moments of the distribution of C for a 
given sample of n observations are found to be 


RANDOMKBBS IN OBDBRBD 8BQUBN0S8 



-6 + 4(«-3)i + 9^Lj- 

Q mo 

* d 2 


Mt 

jjg 7?l2 tn2 

mi 


4n(n — l)(n — 2) 



Mi 

8n*(n — 1)(» — 2)(n — 3) 

3)* - 48n(4n - 9) ^ 
m 


- 24n(3n* - 17« + 27) ^ + 

(8n* - 

45n* - 23n + 210) 


mt 


mi 


+ 16(2n* + 5n - 21) + 

4(17n* 

- Zln + 42) — ? 


mt 


ml 


- (7n* + 13n - 6) — { 


2. Distribution of C for samples drawn from a normal universe. The 

first four moments of the distribution of C for samples drawn from a given popu- 
lation may be derived from the above formulae by substituting the mean values 

of — i I etc. of samples from such a population. For normal samples con- 

mj rtii 

taining n observations, for example, the following mean values apply, as obtained 
by the method presented by R. A. Fisher [1, 2]: 


j»i _ 

6(n - 2) 

iir* ~ 

mt 

(n + l)(n + 3) ’ 

mi _ 

3(n - 1) 

ml 

(n + 1) ’ 

ml ^ 

3(3n* + 23n* - 63n + 45) 

ITlt 

(n + l)(n + 3)(n + 5) 

m^mt _ 

60(n - Din - 2) 

ml 

(n + l)(n + 3)(n + 5) ’ 

mo _ 

15(n - D* 

m2 

(n + l)(n + 3) ’ 

ms _ 

105(n - D* 

n4 

in + l)(n + 3)(n + 5) ’ 


Replacement of the sample moment ratios by the mean values of those ratios 
for normal samples yields the following moments of C; 

3(n’ + 2n - 12) 

(n - l)(n + l)(n + 3)(n + 6) ’ 


Mt - 
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Compatible results for the case of normal samples have been obtained by 
Williams [3], using another method. 

From the above results, the value of 

_ 3(n* + 2n - 12)(n - l)(n + 1) 

(n-2)»(n + 3)(n + 6) 

is seen to approach normality as the sample size is increased. 

Inasmuch as the distribution of C for normal samples is limited in both direc- 
tions and is symmetrical, it is apparent that the Pearson Type II distribution 
may be considered representative. Fitting this curve to the moments given 
above, the equation of the frequenc 3 ’^ distribution is given by 


where 


= 4 -^' 


(n* — n* — 13n* + 37n 


2(w»-13w + 24) 

s _ (n* -f 2n - 12)(n - 2) 

® (n» - 13ra + 24) ’ 

« r(2OT + 2) 

a.2*'"+‘[r(»i + !)]*■ 

The values of fit for the distribution, for various values of n, are as follows: 


Sample size, n 

ft 

5 

2.300 

10 

2.570 

15 

2.684 

20 

2.750 

25 

2.793 

50 

2.833 


Due to the effect of even moments higher than the fourth, the approximation 
afforded by the Type II curve is not reliable for samples containing less than 
about eight observations. As the sample size deert^asea below this limit, the 
extremes of the C distribution deviate increasingly from the extremes (d=a) 
of the fitted curve: with such a platykurtic distribution, therefore, the effect 
upon the lower significance levels vitiates the approximation. 

Although either 02 or the theoretical limits of the distribution of C could 
have been employed as a parameter of the fitted curve, it was considered ex- 
pedient to use the former. In any case, of course, the advantage to be gained 
would be in connection only with samples containing few observations (less 
than eight). The evidence afforded by empirical sampling indicates that use 
of the limits as a parameter might render the approximation less valid. 
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In order to fadlitate use of the ^iprordmate distribution iot samirieB of eight 
or more obeervations, the vatues of C aesocisted with two prohabShy levetehie 
tabulated below in Table I. The ratio of each vidue of C to its standaid «Tor 
is also shown, to demonstrate the approach to normality. The aignificanoe 
levels recorded exclude 10% and 2% of tiie area under the curve, xeq>ectiTely. 
In most practical applications, these will be the 5% and 1% levels, leqpectivdy, 
since only positive values of C exceeding the tabulated value will ordinarily be 
considered significant. The tabulations were prepared from tables erf the 
fxmetion /.(p, q) [5], where q .5 and p — m + 1, with the transfemnation 
, C* 


1 - 

o* 


TABLE I 

Significance levels of the absolute value of C 


tie size, n 

P - .10 

C.m/wj 

P- .02 

Cm/** 

8 

.5088 

1.6486 

.6686 

2.1664 

9 

.4878 

1.6492 

.6456 

2.1826 

10 

.4689 

1.6494 

.6242 

2.1958 

11 

.4517 

1.6495 

.6044 

2.2068 

12 

.4362 

1.6495 

.5860 

2.2161 

13 

.4221 

1.6495 

.5691 

2.2241 

14 

.4092 

1.6494 

.5534 

2.2310 

15 

.3973 

1.6493 

.5389 

2.2369 

16 

.3864 

1.6492 

.5254 

2.2423 

17 

.3764 

1.6492 

.5128 

2.2470 

18 

.3670 

1.6491 

.5011 

2.2513 

19 

.3583 

1.6489 

.4900 

2.2550 

20 

.3502 

1.6488 

.4797 

2.2585 

21 

.3426 

1.6488 

.4700 

2.2616 

22 

.3356 

1.6486 

.4609 

2.2647 

23 

.3288 

1.6485 , 

.4521 

2.2676 

24 

.3224 

1.6484 

.4440 

2.2700 

25 

.3165 

1.6484 

.4361 

2.2717 

nal (n = 

oo) 

1.6447 


2.3262 


The distribution of C for normal samples containing 20 or m<»e observations 
is sufficiently normal, for most practical cases and for the more common signifi- 
cance levels, to permit use of a table of areas under the normal curve, in conjunc- 
tion 'with the standard error cr« = ^ + 1) ’ significance levels 

shown in Table I result, at worst, in a one per cent error of probability estimate, 
if the normal approximation is used in their place: that is, if 1.6447 times the 
standard error is used instead of the tabulated significance level, the probability 
will be .0505 at most, for the values of n which are tabulated. 


The 5% significance levels 
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m 

S. Gmetal discussioii on tiie application of C. It may be wondered 
^y the statistic C has been used, rather than the more easily computed statistic 

S (Xi - Xi+i)* 

C = — — — . As far as a significance test is concerned, it clearly 

1 

does not matter which is used, since C and C' are linearly related. However, C 
may be regarded as symmetrically distributed about 0 in samples from a normal 
population to within at least four moments. Excessive departure of C from 0 
may be taken as indicative of the presence of non-randomness in the series, the 
actual significance test being based, of course, on the probability of obtaining a 
departure larger than a given observed one, under the assumption of a random 
series. Positive values of C, in general, correspond to positive correlation while 
negative values correspond to negative correlation between successive obser- 
vations. 

There are various ways of detecting non-randomness in a series of observations, 
such as regression methods, analysis of variance, etc. The use of regression 
methods implies that we must know in general the type of regression function 
to be tried. C is a very flexible statistic, on the other hand, for testing the null 
hypothesis that a series is random, no matter what the alternative hypothesis is. 
A thorough study of (7 as a statistic for testing the hypothesis of randomness in 
an ordered series should include a study of the power function of C for hypotheses 
specif 5 dng various types of non-randomness. However, we shall simply appeal 
to intuition in proposing the statistic C, and forego power function considerations 
in this note. In practice, the advantage of using C increases with the length of a 
series: lack of randomness in a single sequence of ten or less observations may 
ordinarily be detected by regression methods, in fitting a low order polynomial. 
In a longer sequence of measurements, on the other hand, the presence of com- 
plicated regression or of periodicity is often suflSciently obscured by variation 
to elude detection by any other than a flexible method. 

The statistic could be used to advantage in the field of applied statistics, in 
the investigation not only of variate series but of attribute series as well. For 
the latter purpose, an effort to tabulate the relationship between the level of 
significance and the percentage of either attribute would facilitate statistical 
investigation of random arrangement. A direct application could thus be made 
to binomially distributed attributes by a scalar assignment (0, 1) to the dichot- 
omy, followed by a procedure similar to that presented above. Similarly, the 
randonmess of vectorial observations could be examined from the viewpoint of 
arrangement. The common method of treating such problems, — the ‘^random 
walk method,'' — has occasionally been found inadequate in dealing with specific 
forms of non-random order; this is especially true when the allocable cause of 
variation has a multi-directional effect. 

Needless to say, each of the fields of application considered so briefly above 
would require development before a routine, eflScient method of investigating 
ordered arrangement could be established. Although probability level tables 
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have been provided in this paper for C as applied to normal samples,- it is quite 
evident that tables for samples from other parent distributions would be needed 
for some of the applications mentioned above. 

4. An illustration of the use of C. Although one example has alre^y 
been presented elsewhere [4] in which the distribution developed in Section 2 
has been employed, a typical application of the statistic to an example in the 
field of quality control will be given here in order to illustrate the mechanics of 
solution. The data presented in Table II represent the percentages of defective 
product turned out daily, over a period of twenty-four days, by a single workman. 
The total output each day closely approximates five hundred parts: this fact is 
brought out to explain the calculation of x for the observed series of percentages, 
— ^it has no bearing upon the use of C. 

TABLE II 

Percentage of product rejected 


Day 

7 c , X 



1 

7.4 

64.76 


2 

8.8 

77.44 

1.96 

3 

11.4 

129.96 

6.76 

4 

10.3 

106.09 

1.21 

5 

11.9 

141.61 

2.56 

6 

12.2 

148.84 

.09 

7 

10.0 

100.00 

4.84 

8 

8.4 

70.56 

2.56 

9 

9.4 

88.36 

1.00 

10 

10.9 

118.81 

2.25 

11 

9.9 

98.01 

1.00 

12 

11.8 

139.24 

3.61 

13 

10.0 

100.00 

3.24 

14 

8.9 

79.21 

1.21 

15 

9.7 

94.09 

.64 

16 

9.3 

86.49 

.16 

17 

12.0 

144.00 

7.29 

18 

12.3 

151.29 

.09 

19 

10.3 

106.09 

4.00 

20 

8.6 

73.96 

2.89 

21 

10.4 

108.16 

3.24 

22 

11.1 

123.21 

.49 

23 

9.4 

88.38 

2.89 

24 

8.2 

67.24 

1.44 

Totals 

242.6 

2495.82 

55.42 


nJP 2452.28 
2a;* - 43.64 

C « .3636 (significant) x ^ 21.518 (23 degrees of freedom) (not significant). 
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The value of C derived from tiie data lies betweeo the two mgnificance levels 
tabulated in Table I ; there is reason to believe tiiat the data are ordered, or non* 
random. Computation of x^ however, has been carried out with the hypothesis 
that all product was made under the same conditions (i.e. with a percentage 
defective equal to 10.108%, the mean of the group). The value so obtained is 
associated with a probability of about P = .50; the hypothesis is not disproved 
by this test. In short, the variability of the twenty-four observations could be 
considered random if it were not for the order of their arrangement. 
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ON CERTAIN UKELIHOOO-RATIO TESTS ASSOCUTED WITH THE 
EXPONENTIAL DISTRIBXmON 


By Edward Patjlbon 
Washington, D.C. 

Various likelihood-ratio tests and their distributions in samples from a popula- 
tion having the elementary probability law B < x ^ », have bem 

ff 

studied by Neyman and Pearson [1] and Sukhatme [2]. In this note the power 
functions and the question of bias of several likelihood-ratio tests will be in- 
vestigated. The exponential distribution appears to be appropriate for dealing 
witli problems involving the intervals of time between events which tend to be 
random, as for example the interval between consecutive telephone calls, or 
the interval between consecutive accidents to the same worker. 

To test the hypothesis H' that the location parameter B is equal to some 
fixed value, it being assumed that the scale parameter <t is known, we can for 
simplicity take the set il of admissible populations from which the sample might 
have been drawn tobe{ — « <£< 4 - oo,(rs=l|, while the subset w from 
which the sample must come when the hypothesis is true is = 0, cr — 1). 
Then the likelihood-ratio Xi for testing this hypothesis is 

. _ P(<a max.) 

P{U max.) 


- S 
e <-i 


~ 2 ) 
e <-i 


where xi is the smallest observation in a random sample of n. The regiem of 
acceptance of tliis hypothesis (ionsists of all points in sample space for which 

Xu ^ Xi < 1, 

where Xu is chosen so that / fi^i(Xi) dXi = 1 — a, a being the level of significance 

used and g (Xi) dXi being the distribution of Xi when B is really equal to seio. 
The region Xi, < Xi < 1 is equivalent to the region in the sample space for which 


0 < xi< ki;kt 


log Xu 

n 


For any value of B the distribution of Xi is known [3] to be 

dxi — dxi . 

801 
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Setting B — 0, the relationship between ki and a is 

d*i = 1 — o, so 


r 


njfci ^ 

e ~ a. 


When B < 0, the power function P(B), for this test is 

P(B) = 1 - r (toi = 1 - e"*[l - a]. 

Jo 


f 


ne 


nizi—B) 


dxi =5 ae”®. When B > ki, 


WhenO<B<fci,P(B) = 1 
P(B) = 1. 

Since > 1 if 5 > 0 and also e”® < 1 if J3 < 0, P(JS) is obviously > a if 
B 9 ^ 0, This test is therefore completely unbiased in the sense of Daly [4]. 
In addition, it is not difficult to prove that this test has the unusual property 
of being a xmiformly most powerful test with respect to all alternatives. 

To test the hypothesis that the location parameter is equal to some fixed 
value, say P = 0, when the scale parameter a is unknown, the likelihood-ratio 
is easily seen to be 

X) (Xi - Xi) 




2 


1 


1 + 


nxi 


2 (Xi - Xi) 


i-1 


The region of acceptance consists of all points in the sample space for which 

< X* < 1 where / gtiXo) dX 2 = 1 — a. This is equivalent to the region 
A*, 

(1 - \V”) 


kt= (n-l) 




( 1 ) 0 < ^ ; 

S (Xi-Xi) 

L<-> 

The relation between kt and a is easily found from the distribution of t when 
P « 0, which is known to be [3] 

*.(,) it = — . 

Therefore .*(f) (if — 1 — a, so fl + 

It is somewhat easier to find the power function of this test by considering the 
region of acceptance as made up of points in the Xi , a plane for which 


|— (n— 1) 


0 < *1 < ^ where a 
n 

which is identical with the region in (1). 


Xi) 

n — 1 
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The joint distribution of Xi and < is [3] 

i'lixi, s) dxi da a ^(xi) da, 

where 


and 


^(«i) dxi a - dxi 


/n_lY ‘.-’e-i-"-"* 


♦.(.)* - 

When B < 0, the power function P(B) of this test is 

fikitin 


(2) 


P(B) = 1 - j del hM dxi = l- - a], 

i B > 0, the power function is 

i»oo pki§ln 

P{B) a 1 — / da fiixi, a) dxi 

j Bn/ka Jb 


a ae""' + 7 




where I[p; x] = 


r.(p) 

r(p) 


-jS > 

jf x""*®”* dx 


which is the form in which the Incomplete Gamma Function has been tabulated 
[5]. 

Since <r must be positive, e""'" < 1 if B < 0 ftnd therefore B(B) > o in the 
interval — « < B < 0. To show that P{B) is > a in the interval 0 < B < », 
it is simpler to work with the expression for P(B) as a double integral in (2), 
than to differentiate the power function directly. Performing the integration 
with respect to Xi , 

P(B) a 1+ r - l].^(«)d8. 

Jan/kt 

Differentiating with respect to B, 

P'{B)=r 

Janik. 9 


The integral expression for P'(B) is obviously positive. Therefore since for 
B > 0 the derivative is always positive the fimction must be monotomically 
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increasing in this interval (0 < J8 < + « ), so P(B) is > « whra B > There- 
fore this test is also completely unbiased. 

We now consider the hypothesis H'" that two sample^ are drawn^rom ex- 
ponential distributions with the same location parameter, assuming it is known 
the samples must have come from two exponential distributions with the same 

scale parameter. Given a sample of ni values of x drawn from - dx 

<T 

and another independent sample of n* values of y drawn from - dy, the 

(T 

hypothesis wc wish to test is that . Let xi be the smallest of the ni 

values of x and yi be the smallest of the n% values of j/, let L be the smallest of 
the ni + na = iV values of both x and y. Then the likelihood ratio for this 
hypothesis is 


X 3 


*" n ng ““ 

2 (xi - a:i) + 52 (Vi - yi) 

M t-l 

N 

r 1 1 

^ (xi - Z/) -1- - L) 

„ t-l 4-1 


1 

I I 

+ 

I rH 

I 


where 


z = niiyi - xi), 
= nx{xi - 2 / 1 ), 


if 2/1 > xi 

if xi> yi, 


and 






The region of acceptance, X», < Xj < 1, is equivalent to the region 0 < Z < Kiu, 
where Kt is again a function of a, the level of significance, the exact relation 
being 

C*{N-2)dt . 1 

(1 + 


/* 

Jo 


1 - a, 


so 


== a. 


(1 + 

It is known [3] that u is independent of Z, and that its distribution is 


4>h{y) du 


-- 3)!’* 


The distribution of is somewhat complicated; but it can be derived by observ- 
ing that the probability that z lies in any infinitesimal interval zi dt \ dzi is 
the sum of the probabilities that n^iyi — xi) and ni{xi — y^ lie in that interval 
and by then using standard methods for finding the distribution of the difference 
of two variates. For the case G = — Bi > 0, the distribution f{z) of z is 


( 3 ) 


fi{z) dz 
fiiz) dz 


e 




[nie"*""*' -h 0 ^ s ^ n,G, 

ntO < * ^ 00 . 


(ill + «*)<>■ 

-I- 


(ni nt)ff 
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For 0 ^ 6, th¥dutribution of 2 can be derived from (3) by ibtejrctumg- 

ing fii ef^^^,and puttij^— g in place of G. 

The power functicni^^^his test can now be derived. For the case g ^ 0, 
the power function P(G) is 


P(g) - 1 


(4) 




0 n%O/k$ 

/i(2)0j(m) ds - I du I /i(z)^(u) dz 

J9 Jhtu 


+ [ dw f fi{z)<h(.u) d:^ . 

Jn,0/k, JntO ) 


Upon integrating out and simplifying, the power function becomes 


P{G) 


a(^ 

\ni ■ 


ni + n* 




—niO/v 


ni + wj 


Gint — nikz) 
h(r 


The power function when G < 0 is easily derived from that for G > 0 by every- 
where interchanging rti and and substituting — G for G, 

To show that P(G) > a when G 5 ?^ 0, it is only necessary to show that the 
derivative P'(G) of the power function is always positive when G > 0, and al- 
ways negative when G < 0. It is again considerably simpler to use the expres- 
sion for P{G) as a double integral. For the case G > 0, integrating with respect 
to z in (4), 


P(G) 


Hi 


fll -f- 712 


[l^e 


-(yCni+nj)/®-! 


+ 


I 


ni + n* 




f ■[-e-"t':o 4 >k(u) du, 

•'njO/ki ni T Wj 


where [/(x)]o = f(b) — f{a). Upon differentiating and simplifying, 

<n <»» 

P>(G) * - c-*’""]0,(tt)d« 

(ni + n2)<r Jo 


(nj + n»)<f Jnto/k, 


Both integrals are easily seen to always be positive, so P'(G) is positive when 
g > 0. In the same manner it can be shown that P'(g) is negative when g < 0. 
Therefore this test is also completely unbiased. 
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The question of investigating the bias of the likelihood-ratio tests for (a) 
testing the hypothesis that <r = vo when B is known and (b) testing the hy- 
pothesis that ff = ffo, nothing being known about the value of B, are practically 
identical with the analogous problems for a normal distribution. The results 
are also the same, for the X test for (a) is completely unbiased, while that for 
(b) is biased. 
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ON THE MATHEMATICALLY SIGNIFICANT FIGURES IN THE 
SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS 

By L. B. Tugksrman 
The National Bureau of Standards 

1. Introduction. The number of mathematically significant figures in the 
solution of simultaneous linear equations has received attention from a number 
of writers [1-6]. It is an important subject, not only in least squares and 
correlations, but in many other problems of science where simultaneous equa- 
tions arise: it may not be amiss, therefore, to examine it from a fresh start, 
particularly since (as will be shown) some of the rulers that have been published 
on it fail in certain frequently occurring circumstances. 

2. Definitions. Before proceeding into the subject it will be necessary to dis- 
tinguish between the computer's terms ^‘significant figures'' and “determinate 
significant figures." The former are the figures that compose a number, without 
the consecutive ciphers that precede or follow them, merely to locate the decimal 
point. “Determinate significant figures," on the other hand, are figures that 
are justifiable on computational grounds. From the computer's point of view, 
the number of significant figures remains independent of what is statistically 
significant. To avoid confusion in what follows, the term “significant figures" 
will be used in the computer's sense, and the adjective “determinate" will be 
supplied where mathematical determinacy is implied. 

To avoid prolixity the term “observational error" will include any uncertainty 
arising either from errors in the observations or from the statistical nature of 
the problem (e.g. sampling errors, grouping errors, ejDC.). The observational error 
of the result is independent of the particular sequence of computation followed and 
the accuracy with which it is carried out 

The term “computational error" will include all the additional uncertainties 
arising from the approximations occurring in the particular sequence of computa- 
tion used, including the “rounding off" of the final result. The computational 
errors, unlike the observational errors, depend in general upon the sequence of the 
intermediate steps used in the computation as well as on the number of significant 
figures to which they are carried, 

3. Criterion of an adequate computation. If the number written down at the 
end of a computation is to serve its purpose the maximum possible computational 
error must be suitably limited. 

A decimal representation of a number containing /significant figures issubject 

807 



308 


L. B. TTJCKEKMAN 


to an uncertainty (upper limit of absolute error) of 5 in the (/ + l)th place. 
It has, therefore, a possible relative (not absolute) error of representation some- 
where between 5 X and 5 X 10“"^ in magnitude. This relative compu- 

tational error sets the limit to any valid final rounding off. Regardless of the 
accuracy to which the intermediate steps of the computation have been carried, 
this relative computational error introduced by the final rounding off alone 
must be suitably limited. 

In case all of the accuracy obtainable from the data is not needed in the result, 
the sum of the maximum possible computational error (including the error of 
the final rounding off) and the maximum possible observational error must be 
kept below the error which can be tolerated in the result. 

In case all of the accuracy obtainable from the data is needed in the result, 
the maximum possible computational error in the result (including the error of 
the final rounding off) must be negligible in comparison with the uncertainty 
(observational error) in the result arising from uncertainty in the data. Just 
how small a fraction of the observational error is *^negligible” is necessarily a matter 
of judgment^ and will depend upon the nature of the problem. A computational 
error that would be wholly negligible in some ordinary computations might be 
intolerably large in the adjustment of an accurate geodetic survey. In any case 
the only basis for a valid judgment of the adequacy of the computation lies in a 
comparison of (i) the maximum possible computational error that can arise in 
the sequence of computations including the final ^‘rounding off,^' with (ii) the 
observational error of the result arising from the observational errors inherent 
in the data. 


4. Propagation of error in a system of linear equations. Assume that 

(1) ~ hff s ~ I| • * * > 

t 

is a set of simultaneous linear equations derived in some way from observations 
and in which the coefficients and the absolute terms 6, may all be subject to 
observational error. If the relative (not absolute) observational error of a 
quantity q be represented by 8g it may readily be seen that 


( 2 ) 



S (^h/xi}AkiaKk8akh + 2 (6f/a?/)A,/5&, 

k k a 

^ X) AhkClHkBClUi 

k k 


where A is the determinant of the coefficients au , and Au, is the term corre- 
sponding to Qkk in the reciprocal (not the adjoint) determinant. 


5. Upper limits to observational errors. The sign and magnitude of the 
relative errors 5au and 56, are unknown, but we shall assume that it is possible 
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in any problem to aasign to them upper UmitB 

I aau I and I I 

which in magnitude they cannot exceed. If the problem is such that the values 
of each of the tou and the Sb, are wholly independent of each other, it is then 
possible that their magnitudes may all reach their upper limits | tou I and | tb, | 
simultaneously, in wluch case upper bound* of toy and id may be placed at 

1 toy 1 = 2 £ I (x»/Xy)d*yO** | | to** 1 + S 1 {b,/Xi)A^ | | 66, ] 

(3) * * 

I fiA I =* 2 I A**a** 1 1 to** I 

k k 

6. Indefiniteness of die problem in die general case. The values of the to** 
and Sb, may not be independent of each other, in which circumstance knowledge 
of the law of their dependence would make it possible to assign upper limits to 
the magnitudes of toy and SA. These upper limits can not be larger than the 
upper boimds shown in equation (3), and in special cases they will be much 
smaller. Since the dependence of to** and Sb, may in general have any form 
whatever, cases can and will occur in which the upper limits of the relative 
errors of toy and SA may have any ratio whatever. 

7. Case of independent errors. Any general discussion of the errors that can 
occur in xy and A must be based either on some special assumption or on the 
limiting assumption that the errors are independent. It is this latter assump- 
tion that underlies the usual discussion, and will be the basis of what follows. 
Equation (3) gives the upper limit to the toy and SA under these assumptions. 

8. The ratios of | Sxj | and | jA | are still indefinite in spite of the assumption 
of independent errors in die coefficients. However, equation (3) does not deter- 
mine any definite ratio or inequality between the upper bounds | toy | and | SA |. 
The nature of the observations may be such that ^me of the errors in the a** 
and b, are very small and some relatively large. Not infrequently it is safe to 
assume that some of them are free from appreciable error and to ascribe all the 
error of the xy to the error in one or two of the o** or b, . If any statement of a 
definite relationship, either as an equality or an inequality between | SA | and 
the I Sxy I is valid for all possible sets of linear equations, it must at least hold 
in the special case in which the errors of all the b, and the errors of all except one 
of the a** are negligible. 

If such a statement of a definite general relationship between these upper 
limits of errors can be made, it must be possible to write down an equation or an 
inequality between any one of the expressions | A** | and some or all of the 
corresponding expressions | (x*/xy)A*y |, ^' » 1, 2, • • • , n, that will renutin true 
no matter what be the values of the a** and the b, in the original set of simulta- 
neous equations. It is obvious that the ratio of | Au | and | (x*/xy)A*y |, 
(j ^ k), depends upon the values of the a** , and sets of equations can be found 
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to give any assigned value to that ratio. It is therefore impossible to state any 
rule that will restrict the ratio of the relative error of A and the relative error 
of any one of the x ,- , valid for all possible sets of linear equations. 

9. Definite statemoit about the sum of the relative errors in the unknowns. 

However, in the summation 2 I I there occurs the term corresponding to 

j = k, for which | (xt/xi)Ah, | = | .4** |, so that under the assumption that the 
Okk and b, are independent sources of error, we may write the inequality 

(4) 2 I te# 1 < I I 

i 

which states that the sum of the upper bounds to the relative errors of all the Xj 
cannot be less than the upper boiuid to the relative error of the determinant A. 
A corresponding statement can easily be proved for the standard deviations. 

A limiting case can be constnicted in which the inequality (4) reduces to 

(6) Z|to,| = |«A| 

i 

and in which all of the | to/ 1 are equal. For this case, 

(6) I 5A I = n I to, I for all values of j. 

If n < 10 it is obvious that there will be at least one more determinate signifi- 
cant figure in each of the Zj than in the determinant A of the coefficients. 

• It is frequently assumed that the number of determinate significant figures in 
the solution for any unknown cannot exceed the number of determinate signifi- 
cant figures in the determinant A of the coeflScients. We see now that this state- 
ment can not be generally valid, even under the assumption that the Uhk and 
are independent sources of error. As a matter of fact, it is necessary in some 
cases to compute some or even all of the unknowns to more significant figures 
than are determinate in the determinant A of the coefficients, if one would retain 
in the result all the accuracy that is obtainable from the data. 

Cases in which the relative observational error of every one of the unknowns 
is less than the relative error of the determinant A probably occur rarely in 
practice; in fact the only ones that I have seen are those that I constructed 
purposely to show that such a thing is possible. Jlowever, cases in which the 
relative errors of one or several but not all of the unknowns are much smaller 
than the relative error of the determinant A, occur fairly frequently. 

10. Remarks on the case of ^^near indeterminacy.” The major interest in 
curve fitting centers around the condition of ^'near indeterminacy,” i.e,, of a 
small or near vanishing determinant A. Even in the circumstance where the 
relative error of the determinant is much greater than the relative error of some 
or all of the coefficients and absolute terms, the relative error of one or more of 
the unknowns may be much smaller than the relative error of the determinant, 
as may be seen from what follows. 
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In accurate experimentation the «ideavor is, wherever posrible, to attsaiii^e the 
experiment so that the quantity sought comes directly from the measuremoit as 
represented by an equation such as 

( 7 ) a: - p. 

However, so ideal an experimental arrangement is rarely if ever possible, and it 
is a common experience to find that the measurements are represented by an 
equation such as 

(8) a; + ^ + rs + + * • • = p, 

where qy, n, m, etc., are small corrections that must somehow be evaluated. 
For simplicity, the discussion will be confined to the almost trivial case 

( 9 ) x + gy = p. 

Not infrequently the only way the correction can be evaluated is to rearrange the 
conditions of the experiment so that another equation is obtained in the form 

(10) X + - p'. 

Sometimes the nature of the experiment is such that it is not possible to change 
the coefficient of y by more than a small amount, under which conditions 

(11) “ g(i + P), 

and 

(12) p' = p(l + «), 

where and a are small in comparison with 1. The solution of equations ( 9 ) 
and (10) now gives 


( 13 ) 


P « 

p' q' 

1 3 
1 q' 


pq' - p'q 

q' -q 


p(I - a//3). 


The quantity q' — q seen in the denominator of this equation is the determinant 
A of the coefficients, and by equation (11) its value is fiq. Since fiq is assumed 
to be small here, the solution for x encounters a near vanishing denominator. 
It would, however, be wrong to assume that the number of determinate signifi- 
cant figures in x that can be obtained by solving the equations is necessarily 
limited to the number of determinate significant figures in the denominator A. 

If the experimenter has been fortunate in finding suitable experimental condi- 
tions, the denominator A — fiq, although small in comparison with either q' or 9, 
will still not cause difficulty. It will be observed that the coefficients of g' and q 
in the denominator are equal (both being unity). Now if the coefficients p 
and p' in the numerator are nearly enough equal, so that q' and q occur in both 
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numerator and denominator so nearly proportionally that the uncertainties in 
q and q' produce nearly compensating errors in both numerator and denominator, 
then X will be given to more determinate significant figures than are found in 
the denominator A. It can then be said that the experiment is successful in 
evaluating the correction term qy in equation (9). 

On the other hand, in less fortunate circumstances, to the exasperation of the 
experimenter, the denominator A=^q' — q’^fiqjB not only small, but p' and p, 
although still nearly equal, differ enough so that the errors in q' and q are not 
compensated by the nearly equal coefficients in the numerator. The experiment 
will then fail to improve the approximation p for x by failing to evaluate the 
small correction qy in equation (9). This would be an inherent defect in the 
experiment and could not be removed by any manner of computation. 

The same conclusion would of course be drawn from the coefficient of p (viz., 
1 — a/fi) at the extreme right of equation (13). It is not the size of /9 that 
alone determines the number of determinate significant figures in x, it is rather 
the ratio between a and In the fortunate experimental circumstances de- 
scribed above, the near equality of p' and p offsets the near equality of q' and q 
by reducing the term a/jS to a value small compared with unity; the term a/fi, 
being small, acts to reduce the effect of the uncertainties in q and q' (i.e., in q 
and /3) in the evaluation of x. On the other hand, in less fortunate circum- 
stances, the correction term a/j8 can not now shield x from the uncertainties in q 
and q' since the relative difference a between p and p' is not small enough to 
reduce a/|8 to innocuity. 


11. Numerical illustration of compensating errors. As a “horrible example” 
especially constructed to emphasize the theoretical possibilities, take the fol- 
lowing special case — 


(14) 


1000.10000* -f lO.OOOOOy = 1010.10000 
lOOO.OOOOOx -f lO.OOOOOy * 1010.00000 


wherein it is assumed that the coefficients and the absolute terms (assumed to 
be derived from the observational data) are all correct to the fifth decimal place 
as given, and no closer estimate of their errors is possible. So far as known, the 
upper limit to the absolute observational error of each is then the same, i.e. 
5 X 10”*, but the coefficients of x (ou and oji), and the absolute terms (bi and 6i), 
all have nine determinate significant figures, while the coefficients of y (au 
and Om), have only seven. Thus, 

I «au I > 5 X 10”’, I 50,1 1 > 5 X 10”*, | | > 6 X 10"*, 

I I > 5 X 10”*, 


but 

(15) 


I tou 1 > 6 X 10"^ I aoM I > 6 X 10"’, 
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and x » ^ ~ 1, A a> 1, whereupon a substitution of values fnnn (15) into (3) 

gives the inequalities 

(16) 1 te 1 > 3 X lO"", I 1 > 3 X 10”*, I «A 1 > 1.01 X 10~*. 

So far as known, the determinant A may thus be in error by as much as 1 pel* 
cent, and y by as much as 3 per cent, yet x is known closer tl^ l/30th per cent. 
Here the value of the unknown x cannot be adequately represented by less than 
four significant figures, and might even require five, in spite of the fact that 
neither A nor y requires more than three significant figures to represent all that 
is certainly known about them. 

The reason for this disparity in relative errors can be more easily seen by 
substituting numerical values for all the coefficients in the expression for x 
except aa and on . The possible relative errors of au and On are, as noted 
above, about 100 times as great as the possible relative errors of Ou , an, bi, 
and bz , and are the controlling errors in A. In the solution 

_ 1010.10000a« - lOlO.OOOOOow 
^ ’ * lOOO.lOOOOos, - lOOO.OOOOOow’ 

however, both Ou and an occur in both numerator and denominator, and more- 
over the coefficient of each in the numerator is nearly equal to its coefficient in 
the denominator, so that a change in either au or oa changes both numerator 
and denominator nearly proportionally, with the result that their ratio x is 
known much more accurately than either the numerator or the denominator A. 

This kind of compensation of errors in a computation is not confined to the 
solution of simultaneous equations (and it is not an infrequent occurrence in 
other computations). This is one of the many reasons why it is impossible to 
give general rules for the retention of significant figures that will be valid for 
all types of computations. 

12. Geometrical analogy. Moulton [4] illustrate his reasoning by the fol- 
lowing geometrical analogy. The solution of three linear equations is equivalent 
to finding the point of intersection of three planes. When the determinant of 
the coefficients is small in comparison with the coefficients themselves, these 
planes are either nearly parallel, or the line of intersection of any two of them 
is nearly pamllel to the third. In these cases small uncertainties in the location 
of any one of the planes correspond to large uncertainties in the position of their 
point of intersection. 

In the first circumstance the planes might all be nearly parallel to one of the 
three coordinate planes, with the result that large uncertainty would affiict the 
value of the determinant and two of the unknowns, the third being much more 
accurately determined. 

In the second circumstance, the line of intersection of two of the planes might 
be nearly parallel to one of the coordinate axes. When that happens, large im- 
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certainty will afflict the value of the determinant, but only one of the unknowns, 
the other two being much more accurately determined. 

This geometrical analogy can be extended to cover simultaneous equations 
with any number of unknowns. Near-vanishing of the determinant A of the 
coefficients necessarily implies relatively large uncertainties in the determinant 
and also in at least one of the unknowns, but not necessarily in all of them. 
These are, of course, very special cases, but, as noted above, they are of frequent 
occurrence in actual problems. 


13. Evaluation of computational error. The relative computational error in 
X, must be kept within certain definite limits which depend upon the particular 
problem to be solved (section 3). To do this it is necessary to be able to calcu- 
late an upper bound to the relative computational error inherent in any particular 
sequence of computations. 

In many computations it is easy to write down a simple formula that will set 
an upper boimd to the relative computational error involved in that particular 
sequence. This formula contains numbers /i , / 2 , /a , etc., each representing the 
number of significant figures accurately computed at some particular step. 
Once a simple formula for relative computational error is written down, it is 
easy to choose values of /i , / 2 , /s , etc. that will give an upper bound to the 
relative computational error not larger than the permissible limit of maximum 
possible computational error outlined in section 3. This method of determining 
an upper bound of the relative computational error should be used whenever such 
a simple formula can be found. For example, to compute x from equation ,(13) 
we may use the following sequence: ri = g' — g, = n/q = iS, ra = p' — p, u = 
rs/p ^ a, n = u/r 2 = a//3, re = 1 - = 1 ~ a/fi, ry == pre = p(l - a/jS) = x, 

X may then be written as a function of these partial results, viz. : 

x » fy = pre = p(l - n) = p(l - n/ri) = p(l ~ rs/pry) 

( 18 ) 

= p(l - reg/n). 

Applying first order error theory we find 


^ 1 Ml + \e(n)\} 

( 19 ) l-ot/p\ 

+ I €(rfl) I + I €{r^) \ 


where €(r<) represents the relative error in r,- arising from the computation by 
which u was determined from the preceding partial results, ri , ry , • • • , r,-.i , 
and e(x) is the total relative computational error in x when so computed. It 
is easy to keep 6(x) within any desired limits by suitably limiting each error term 
of (19). Since a computation accurate to / significant figures involves a relative 
computational error not greater than 6 X 10^*^, any desired limits can then be 
set to each error term of (19) by a proper choice of the number of significant 
figures that should be carried in that step. 
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Unfortunately there eeem to be no reasonably simple formulae for determining 
upper bounds of the relative computational errors that arise in the solution of 
simultaneous linear equations in more than two variables. This does not ab- 
solve the computer from the necessity of ensuring that his computational errors 
are suitably limited. 

The method I have foimd most economical is to carry the solution of simulta- 
neous linear equations to the capacity of the machine, and as each partial result 
rt is obtained, write it as 

n(l ± «.), 

where is the value actually found and €< is a positive number representing the 
accumulation of uncertainty introduced by all preceding steps in the computa- 
tion. At the end of the computation each of the unknowns is found in the form 

(20) x/(l =fc €,), 

where Xj represents the value found and €y is the upper bound of the relative 
computational error in Xj . 

A comparison of €/ with the upper bound of the observational error | 6X) | of 
equation (3) will then indicate whether the computation is adequate. If the 
comparison shows that the computation was inadequate, it will show in which 
steps the number of significant figures was too small, and by how much. 
The computer can recompute, carrying these steps to the requisite number of 
figures with the assurance that his recomputation will then be adequate. The 
comparison will further indicate in which steps if any the number of significant 
figures /» was larger than necessary. 

When a computer has thus set suitable upper bounds to the relative computa- 
tional error in the solution of a set of linear equations, he is in a position to plan 
solutions of future similar sets so as to perform his computations more eco- 
nomically and yet safely. This is especially true when the solution of simulta- 
neous linear equations arises week after week in routine testing. 

14. Conclusions. Summary rules have been published, purporting to be safe 
guides to computers in avoiding needless work, and ensuring that the computa- 
tions are carried to a sufficient degree of accuracy. Many of them are useful 
guides for certain types of computation and for limited ranges of the numerical 
values entering into the computation, but none of those that I have seen can be 
used generally. The only safe rule, where the matter is of importance, is to 
calculate the maximum possible computational error that can enter in the par- 
ticular sequence of computation followed, and make sure that it is kept within 
the necessary limits. 

It is sometimes necessary to carry the intermediate steps of a computation to 
many significant figures beyond the significant figures given in the data, or kept 
in the result. The relative error of one of the unknowns may be very much 
smaller than the relative errors of the data from which it is computed, while the 
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relative error of another of the unknowns may be larger. The methods of 
ensuring that the computations are adequate are outlined in section 13. 

For the best sequence to follow in the elimination of the unknowns, I shall 
pass along a suggestion of Dr. W. Edwards Deming which he gave in one of our 
discussions of this subject. I venture to pass it along, because it has worked in 
eveiy special case that I have constructed in an attempt to prove that it does 
not hold generally. If ever the suggestion fails, the computer may change the 
sequence; but in any case he is obliged, as stated above, to calculate the maximum 
possible computational error that can enter into his calculations. Dr. Deming’s 
suggestion is this: “To evaluate some but not all of the unknowns to the highest 
possible computational accuracy, retaining as few significant figures as possible 
in the intermediate steps, solve the equations by successive elimination, elimi- 
nating first and evaluating last the unknowns of greatest inherent relative 
accuracy.” 

15. Summary. Expressions are given for the maximum observational error 
in the unknowns of a system of simultaneous linear equations, in terms of the 
relative errors of the coefficients and absolute terms therein. In order to extract 
all the information possible from a system of linear equations representing ob- 
servational results, it is not sufficient in general to assume that the relative errors 
in the unknowns are as large as the relative error in the determinant of the 
system. In many problems the computation of some of the unknowns must 
therefore be carried to more significant figures than are determinate in the 
determinant of the system. Methods are outlined for evaluating computational 
error in the solution of linear equations to ensure that the computations are 
adequate. 

In conclusion I wish to express my thanks to Dr. W. Edwards Deming who 
has given much of his time to assist me in the preparation of this paper. He has 
made valuable suggestions on the material to be included and the general manner 
of presentation. In addition he has criticized the manuscript in detail and 
assisted in the final revision. 
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ON MBCHANICAL TABULATION OF POLYNOMIALS 
Bt J. C. MoPhxrbon 
International Bumneae Machinet Corporation 

1. Introduction. The purpose of this paper is to show how automatic 
accounting machines, which have been used previously in evaluating such 
quantities as Sx" and inuy be used in the preparation of mathematical 

tables of integral powers, of polynomials, and of functions which can be approxi- 
mated by polynomials. These tables may be prepared for any desired intervals 
of the argument such as 1, Yhir, ii i) etc. 

The method is an adaptation of the general theory of “cumulative” or “pro- 
gressive” totals which has proved useful in computing moments and product 
moments both with and without accounting machines. The reader unfamiliar 
with the mathematical method and its machine applications might refer to such 
presentations as those of Hardy [1], Mendenhall and Warren [2, 3], Bazram and 
Wagner [4], Brandt [6], and Dwyer (6, 7]. The main feature of the method is 
the computation of summed products or of summed powers by means of succes- 
sive cumulated additions. It is shown in this paper how it is possible to use 
this same process in constructing tables of powers and tables of polynomials. 


2. The Ciunulative Formulas. If the munbers Ft are defined and finite 
for a: = 1, 2, 3, • • • , (o — 1), o, and if these values of F, are cumulated for z = 
o, X = o — 1, etc., then the value in the row headed by x = 1 can be written 
as ^Ti . If these cumulations are cumulated successively with the superscript 
indicating the order of the cumulation and the subscript indicating the value of x 
which heads the row, then 

¥ 

*Ti " S (g + 2)(x -H l)x jj, 


and in general for i < j, 

( 1 ) 


^ (T^HTi 


Formula (1) is basic to much of the previous work involving cumulative totals. 
Various authors have studied such important special cases as (A) where F, 
equals the frequency function , (B) where F. xfa , and (C) where F« equals 
the sum of all the values of y having the same x value. These special cases have 
been found very useful in computing moments and product moments. 
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The moments may be expressed in terms of the cumulations in a variety of 
ways. The diagonal formulas have the differences of zero as coefficients and are 
expressed in terms of ^Ti , ^Ti , , ^Tz , *2^4 , etc. The columnar formulas, 

whose coefficients have been recently studied [6, 7], are expressed in terms of 
cumulations of the same order, , with j fixed. Razram and Wagner [4] 
have given formulas which utilize the entries of different rows and different 
columns but which demand fewer entries for the formulas. Razram and Wag- 
ner worked out the formulas through but the argument holds for Xx^Fx . 
For purposes of comparison the values of Zx*Fx , t = 0, 1, 2, 3, 4, as they appear 
in the diagonal, colunmar, and Razram-Wagner systems are presented in 
Table I. 


TABLE I 

Values of Xx*Fgfor * — 0 , 1 , 2 , 3 , 4 . 


Fw 

Diaconal 

Columnar 

Raaram-Wagnar 

XF, 




XxFx 




SsflFx 




Xt^Fx 




Xx*Fx 





In developing the theory of the later sections of this paper I have developed 
further formulas of the type shown by Razram and Wagner since these formulas 
have fewer terms than do those of the other systems and the coefficients are 
factorable by (j — 1)1/2. These formulas for with « even, feature such terms 
as ’Ti + , etc., so that there are two entries from the same 

<5olumn. For the purposes of this paper it is preferable to have a single entry 
from each column and this situation results from continued application of the 
formula 

(2) = % + ’Ti+i = + 2 ’Ti+i . 

The formulas for Xz*Fx with s ^ 12 are given. The alternative forms are given 
for the formulas involving even values of s. 

2F. Sa;F, = Yi, 2a: = ‘Ti + ’T* = *7’!+* = + 2 ‘T, , 

2a: ‘F. = 'T, + 6 ‘T, , 2a: *F. = + 12 

= *ri 4- 2 *Ti + 12 Y, + 24 *T,, 

2x ‘F, = + 30 *Tt + 120 Y, , 

2x *F. = ‘ri+* + 60 + 360 

= ’Ti + 2 *r, + 60 ‘r* + 120 ‘r, + 360 ‘r, + 720 

2a: ’F. = ^Tt + 126 *Tt + 1680 *T, + 6040 'T. , 

2x *F. « *Ti+t + 262 ‘Tit, + 6040 + 20160 ’7’4+s 

= Yi + 2 *Ti + 262 *Tt + 604 *Tt + 6040 *T, + 10080 ^Ti 
+ 20160 *r« + 40320 *r», 
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(3) Zx V, » *Ti + 610 *T, + 17640 *r, + 161200 'T 4 + 362880 “n, 

Zx “F. - ‘ri+i + 1020 *Tu 4 + 62920 + 604800 *T^ + 1814400 

- *ri + 2 *r, + 1020 *Ti + 2040 ‘r, + 62920 *r, + 106840 ’n 
+ 604800 *Ti + 1209600 T, + 1814400 “r* + 3628800 “T* , 
Zx ”F, == Yi + 2046 + 168960 *r, + 3160080 *Tt + 19968400 “r, 

+ 39916800 “r«, 

Zx “F, » + 4092 + 506880 ^^4 + 12640320 

+ 99792000 + 239600800 “fn-r 

= 'r, + 2 *T, + 4092 *Tt + 8184 ‘r, + 506880 ‘F, + 1013760 'T* 
+ 12640320 'r* + 25280640 *r, + 99792000 “T, 

+ 199684000 “r« + 239500800 “r, + 479001600 “r, . 

The derivation of these formulas is obtained with the use of (1), with the use of 

(4) % = + ^-^Ti , 

and with the use of fonnulas of lower order. For example we have from (1) 

(*, + 4)(x + 3)(x + 2)(x + 1)* jp _ trr 
2 F.- r. 

so that 

Zx ‘F* = 120 *Ti - 10 Zx ‘F, - 35 Zx ‘F, - 50 Sx *F, - 24 SxF, 

which after substitution of Zx *Fx , Zx ’F, , etc. and simplification results in the 
value *ri + 30 *T, + 120*7’, . 

3. Tables of powers. If F, := 1 when x = a, but is zero otherwise 
then Zx *F, is equal to o*. It follows that the value of o* can be obtained from 
the successive cumulations of this F, with the use of (3). For example in 
Table II 


TABLE II 

Cumulations 0 / F* — 1, when x » 6, 
0, when x 6. 


a 

X 

H 

ij. 

*T 

ijp 

at 

•r 

1 

6 


1 


1 

1 

1 

2 


0 

1 


3 

4 

6 

3 


0 

1 


6 

10 

15 

4 


0 

1 


10 

20 

85 

5 


0 

1 

5 


S6 

70 

6 


0 

1 

e 

21 

56 

126 

7 


0 

1 

7 

28 

84 

210 

8 

im 

0 

1 

8 

36 

120 

330 
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6* - ‘Ti + 2 *r, = 6 + 2(16) - 36, 

6* - Yi + 6 Y, - 6 + 6(35) » 216, 

6‘ - *ri + 2 *Tt + 12 *Tt + 24 *r, - 6 + 2(16) + 12(36) + 24(36) - 1296. 

The values of *Ti , *r* , *Ti and ‘T* for a » 6 are italicised in Table II. 

To get the values of 6’, 6*, 6^ etc. it would be necessary to start to cumulate 
from X = 5. Now since the values of 'T* are unity, it follows that the values 
for a = 5 can be found by taking the entries above those for a — Q. Thus 
= 5, ‘r* = 10, *Tt = 20, ‘T, = 16 with 5’ = 5 + 2(10), 6’ - 6 + 6(20), 
5^ = 6 + 2(10) + 12(20) + 24(15). It is evident in general that the values for 
any a‘, a’, a* can be obtained by taking the row headed by a as the bottom row. 
Thus using o = 8, we have 8* = 8 + 2(28), 8’ = 8 + 6(84), etc. It then appears 
that we may omit the x column of Table II and consider the cumulations to be 
ascending cumulations for a rather than descending cumulations for x. 

A more satisfactory course is to cumulate the coefficients so as to eliminate 
the multiplications. Thus the value of 6^r< could be obtained ‘without multi- 
plication by cumulating 6, 0, 0, 0, 0 • • • rather than 1 , 0, 0, 0, • • • . Several 
cumulations may be carried on at the same time so that the additions are not 
necessary and the tabulation results in a table of the desired powers. 

In preparation of a power table, the formulas (3) become a series of instruc- 
tions on the way in which we are to do the cumulating. For instance the 
formula: 

x' = 5040 •r 4 -t- 1680 ‘r* + m*Tt + , 

tells us that to form a table of the seventh power we must cumulate’ the coeffi- 
cient 6040 eight times; add in the coefficient 1680 when there are six operations; 
the coefficient 126 when there are four; and the coefficient 1 when there are two 
remaining. A change in subscript tells us that the coefficient when first included 
forms a separate total ahead of the ones already partly figured. When the sub- 
script does not change, the coefficient is to be included in the first summary card 
total. The final cumulating operation prints the actual table. 

To prepare a power table by machine we secure a set of cards punched all alike 
with the numbers from 1 to 9 punched diagonally in successive columns across 
the card. The machine is wired to add the coefficient of the highest term by 
selecting the proper digits from the diagonals, cumulate after each card and sum- 
mary punch each total. This way of starting saves one cumulation. The 
summary cards are cumulated repeatedly in the same manner until the number 
of operations indicated by the highest term is completed. When the number of 
operations remaining equals j of another term ^Ti , a card for the coefficient of 
that term is included in the tabulation ahead of the summary cards. This 
automatically adds the new coefficient to each term of the series. When the 
subscript i in ^Ti changes,, the new coefficient card must form a separate total; 

^ This operation is generally known as progreBsive totalling in machine operation. 
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when it does not change, the coefficient card must tabulate in the first summary 
card total. 

To illustrate the tabulation of power tables, the formula for the cube table is — 
a:* = eYj + Yi. 

The successive operations yield the following table: 


TABLE III 


X 

1 

Overation iittm6«r 

2 3 


1 

0 

0 

1 

1 

2 

6 

6 

7 

8 

3 

6 

12 

19 

27 

4 

6 

18 

37 

64 

6 

6 

24 

61 

125 

6 

6 

30 

91 

216 

7 

6 

36 

127 

343 

8 

6 

42 

169 

512 

9 

6 

48 

217 

729 

10 

6 

54 

271 

1000 


In actual machine work, operation 1 can be omitted and work begim with opera- 
tion 2. The machine is set to add the coefficient 6 of the highest term from 
each card and an accumulated total is printed and punched for each card tabu- 
lated, giving the results shown under operation 2. An additional card is punched 
for the coefficient of the second t^erm, 1, and placed ahead of the cards produced 
in operation 2. The cumulation and punching is repeated, giving the results 
shown under operation 3. The summary cards from this operation are cumu- 
latively tabulated, giving the results shown under operation 4, which is the 
table of cubes desired. 

Similarly, for a table of the fourth power, the formula x* = 24 + 12 ^^2 + 

2 "b ^Ti indicates the following operations — 


TABLE IV 


X 

1 

2 

Operation numbxr 

3 4 

5:x* 

1 

0 

0 

0 

1 

1 

2 

0 

12 

14 

15 

16 

3 

24 

36 

50 

65 

81 

4 

24 

60 

110 

175 

256 

5 

24 

84 

194 

369 

625 

6 

24 

108 

302 

671 

1296 

7 

24 

132 

434 

1105 

2401 

8 

24 

166 

590 

1695 

4096 

9 

24 

180 

770 

2465 

6561 

10 

24 

204 

974 

3439 

10000 

11 

24 

228 

1202 

4641 

14641 

12 

24 

262 

1454 

6095 

20736 
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Note in operation 3 where the subscript does not change, the coefficient 2 is 
added to the first card punched by the machine, while in operation 4 where it 
changes, the coefficient 1 appears as a separate total. 

4. Tables of polynomials. To tabulate values of /(*) = o + i>* + cas* • • • 
(where a, b, c, • • • , are positive or negative coefficients) the method is similar 
to that of preparing power tables except that the coefficients to be added are 
determined by multiplying the coefficients of the formulas for the different powers 
by the values a, b, c etc., adding the coefficients of like terms in the various 
formulas, and using these resultant coefficients in place of the simple coefficients 
used in the power tables. Thus if we wish to tabulate values of f{x) = 4 + 3a: + 
2x* + a;‘ the coefficients are found as follows: 

4x® » 4 ‘To 
+ 3x = +3 Yi 

+ 2x*= +2^Ti + 2.2*Ti 

+ x‘= + ’Ti +30^* + 120*r, 

fix) = 4 *r« + 6 + 4 *r, + 30 *Tt + 120 *T» 

This equation gives instructions to perform six operations with 120 as coeffi- 
cient; adding the coefficient 30 as a separate total when there are 4 operations 
remaining; adding 4 to the first summary card total when there are 3 operations; 
adding 6 as a separate total when there are 2 operations remaining; and adding 4 
on the last operation. 

The first few totals appear thus — 


TABLE V 


9 

1 

2 

Operation number 

3 4 

5 


0 






4 

1 

0 

0 

0 

0 

6 

10 

2 

0 

0 

30 

34 

40 

50 

3 

120 

120 

150 

184 

224 

274 

4 

120 

240 

390 

574 

798 

1072 

5 

120 

360 

750 

1324 

2122 

3194 

6 

120 

480 

1230 

2554 

4676 

7870 

7 

120 

600 

1830 

4384 

9060 

16930 

8 

1 120 

720 

2550 

6934 

16994 

32924 

9 

120 

840 

3S90 

losu 

26318 

59242 

10 

I m 

960 

4350 

14674 

40992 

100234 


It is not necessary to coniine these tables to values for whole numbers, as we 
can tabulate equally well values of fix) for intervals of x of .1, .01 or .001 or i, 
i, 1 etc. In tl^ case, before combining formulas for different powers we multi- 
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ply bolii sides by the desired interval raised to the power to which x is xidaed in 
that particular formula, then add like terms as before. 

To tabulate the previous example in . Iz intervals we proceed as follows : 

4x® = 4.000 ‘To 

3z/10 - + .3 *r» 

2(z/10)* - + .02 “Ti + .04 *r, 

(x/10)‘ = + .00001 + .00030 % + .00120 *T, 

fix) = 4 ‘To + .32001 “Ti + .04 ’T, + .00030 *Tt + .00120 *T, 


TABLE VI 


X 

1 

2 

OptroHon number 

3 4 

5 


1 

0 

0 

0 

0 

.32001 

4.32001 

2 

0 

0 

.0003 

.0403 

.36031 

4.68032 

3 

.0012 

.0012 

.0016 

.0418 

.40211 

6.08243 

4 

.0012 

.0024 

.0039 

.0457 

.44781 

5.53024 

5 




.0532 

.50101 

6.03125 

6 




.0655 

.56651 

6.59776 

7 

.0012 

.0060 

.0183 

.0738 

.64031 

7.23807 

8 

.0012 

.0072 

.0255 

.0993 

.73961 

8.07768 

9 

.0012 

.0084 

.0339 

.1332 

.87281 

8.96049 

10 

.0012 

.0096 

.0435 

.1767 

1.04951 

10.00000 


Where any coefficients are negative in the equations expressed in ’Ti terms, 
they are simply added in as minus figures. 

To roimd off the preceding function to 3 decimal places, we add 6 to the con- 
stant term in the position to the right of the last decimal retained, i.e. in 
this case the 4th decimal place. The constant term is then 4.0005. 


Exact 

Counter reade 

Prints 

4.32001 

4.32051 

4.320 

4.68032 

4.68082 

4.680 

5.08243 

5.08293 

5.082 

5.53024 

5.53074 

5.530 

6.03125 

6.03175 

6.031 

6.59776 

6.59826 

6.598 

7.23807 

7.23857 

7.238 

8.07768 

8.07818 

8.078 

8.95049 

8.95099 

8.950 

10.00000 

10.00050 

10.000 


6. Automatic calculation of polynomial coefficients. Frequently when 
polynomials are being evaluated, the process of forming the coefficients can be 
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performed automatically from a punched-card table. Such a table consists of a 
set of cards for each power x* containing the multiples of all the coeflScients of 
each of the terms ^Ti in the formula (3) for that power. These multiples are 
1, 2, 3, 4 , . . . , 9; 10, 20, 30, 40 , . . . , 90; 100, 200 , . . . , 900; 1000, 2000 etc., 
and may be produced automatically by making a linear table of each coefficient 
in the manner described in this paper. Each card is punched with the informa- 
tion called for by the heading of the following card form: 


/ 

/ 8 

J 

i 

multiple 

07 

06 

03 

00005 


coeflf. X 
multiple 
008400 


The particular figures indicated are those which would be punched for the 
term 5(1680)*ra in the representation of b:c according to formula (3). 

The table is used by withdrawing the cards for the coefficients a, 6, c, d, etc. 
of the desired polynomial. For instance, if one of the polynomial coefficients is 
14485 we select from the J section of the table all cards containing the multi- 
ples 10000, 4000, 400, 80, and 5. In the ^ table there are 4 cards for each multi- 
ple, one each for terms ^Ta , , and ^Ti . These cards are combined with 

the cards selected for the other coefficients of the polynomial and sorted to bring 
all cards for each together. The cards for each term are then automati- 
cally added on the electric accounting machine. 


6. Subdividing tables. In preparing tables it may be desired to prepare 
the table in more detail at certain points, giving values of the function at 1 /lO, 
1/20, 1/50, or 1/100, etc., of the interval of the rest of the table. This may 
readily be done by recalculating the coefficients of the cumulative terms, and 
using these values in the same manner as the original ones. 

There are many formulas for the determination of the subdivided differences 
given in various texts on interpolation, such as those given by Comrie [8] and 
Bower [9]. One effective method is to use formulas (3) to calculate the sub- 
divided differences. The values called for in the formula for the highest power 
are taken from the table of the function at the regular interval, giving effect to 
the rule involving subscripts. These coefficients are reduced by an amount 
sufficient to cancel the coefficient of the highest cumulative term, and the coeffi- 
cients of the remaining cumulative terms are reduced in proportion according 
to formula (3) for the highest power. Usually the coefficient of the highest term 
of the formula will divide evenly into the coefficient taken from the taWe, and 
the other reductions are calculated by multiplying this result by the other 
coefficients of the formula. The highest remaining coefficient is then reduced 
by an amount sufficient to cancel itself, and, by use of the formula (3) for the 
power whose highest cumulative term matches the highest remaining coefficient, 
the reduction to the remaining cumulative terms is calculated and subtracted. 
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The hii^est r emainin g coefficient is reduced in a like manner, and thisiwocess is 
continued until all the cumulative coefficients have been analysed. 

The partial cumulative coefficients thus computed are multiplied by the de* 
sired subdivision 1/m raised to the power of the corresponding formula (8), 
and recombined to form the new coefficients, as db.own in the example below. 
In taking values from the table, when the subscript does not change, the tabular 
value must be reduced by the amount of the higher coefficient with the same 
subscript, to give effect to the rule that the coefficients in such cases are incre- 
ments (see last example in section 3). 

To subdivide the pol}momial of section 4 at z => 7.0, we take the italicized 
values from Table V starting at /(7) as , and proceed as follows: 


From Table V 

*T» 

120 

‘r, 

960 

-120 

*T, 

3390 

10324 

-8390 

15994 

‘To 

16930 

Fix) 

120 

840 

3390 

6934 

15994 

16930 

oz* 

120 


30 


1 




840 

3360 

6934 

15993 


hx* 


840 

420 

70 

35 





2940 

6864 

15958 


ex* 



2940 


490 






6864 

15468 


dx* 




6864 

3432 





' 


12036 


ex 





12036 








16930 

the interval is 1/10 we have: 






'T, 

‘r, 

*T2 

% 


‘To 

z'/lO* = .00120 


.00030 


.00001 


35zVlO* = 

.0840 

.04200 

.0070 

.00350 


490zVl0‘ = 


2.94000 


.49000 


3432z*/10* = 



68.6400 

34.32000 


12036Z/10 = 




1203.60000 

+ 16930 


fix) = .00120 “r, + .0840 ‘r, -f 2.9823 + 68.0470 •+• 1238.41351 Yi + 

16930'To provides the coefficients for subtabulating the function at the desired 
interval, ^ginning at the argument x 7.0. 

7. Accuracy of Tables. When th'> values of the coefficients are not 
exact, owing to the original values for a, b, c etc. or the dropping of decimals in 
the computation of the coefficients, the errors accumulate fairly rapidly. E^ch 
coefficient will introduce its own error into the summation. 
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To maintain accuracy throughout a long table it is advisable to transform /(x) 
by Homer^s method of decreasing the roots [10, pp. 100-101], compute new 
coefficients for the transformed equation at intervals, and prepare the table in 
sections* Decreasing the roots by r gives us a new starting point at x = r. 

Since two or more functions may be computed at one time, a function for 
which the coefficients are not exact may be computed by adding in the usual 
way from the starting values and subtracting from the ending values simul- 
taneously. As many digits as agree in both tabulations of the function may be 
considered correct. 

The tabulations can be made to practically any degree of accuracy on the 
equipment available, as the newer machines can be formed into counters of any 
capacity up to 80 digits. In practice, counters of 16, 20 or 24 digits will ordi- 
narily suffice for the accuracy desired and two or more functions can be evaluated 
simultaneously. Cards are read and added at the rate of 150 per minute, or read, 
added and listed on the tape at the rate of 80 per minute and new summary 
cards produced at the rate of 40 per minute (on alphabetic equipment with gang 
summary punches). Computation may be carried out with additional decimal 
places and the final tabulation of the fimction rounded off to the nearest number 
retained. 

8. Summaiy. The cumulative or progressive-total method is shown to be 
applicable to the preparation of tables of functions expressed in the form of 
a power series. 

The cumulative formulas for the powers through the twelfth power have been 
presented, and simple methods are given for transforming a power series into its 
corresponding cumulative formula, for changing the interval of the table, 
rounding off the values of the function, and subdividing the table at desired 
points. 

It is hoped that this discussion will make tables in printed or punched-card 
form more generally available as a tool for the computer. Since tables may be so 
readily prepared by this process, the usefulness of the tabular method of solving 
problems is greatly increased. 

The author wishes to acknowledge his thanks to Professor P. S. Dwyer for 
various suggestions, particularly in connection with section 2. 
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ON THE PROBABILITY OP THE OCCURRENCE OF AT LEAST m 
EVENTS AMONG n ARBITRARY EVENTS 


By Kai Lai Chung 

H Ta^ing Hua University ^ Kunming , China 

Introduction. Let , • • • , J?n , denote n arbitrary events. Let 
- - > where 0 ^ i ^ j ^ n and (pi , • • • , v/) is a combination of the 

integers (!,•••, n), denote the probability of the non-occurrence of Evi , * • • , 
and the occurrence of i?r,+i , * • • , E ,^ . Let Pir, ..rj denote the probability of 
the occurrence of , • • « , E^^ and no others among the n events. Let Sj = 
2)pr^...ry where the summation extends to all combinations of j of the n integers 
(!,•••, n). Let Pm{v \ , • • • , Vk)y (1 ^ m g A g n), denote the probability of 
the occurrence of at least m events among the k events JF.., , • • • , . 

By the set (aii , • • . , , • • • , x®) — (xi , • • • , a:6) (where 6 g o) we mean the 

set (a^i , • • • , a;«). And by a ^^^combination out of (xiy ^ Xa) we mean 

a combination of b integers out of the a integers (xi , • • • , Xa). 

We often use summation signs with their meaning understood, thus for a fixed 
k, I ^ k ^ n^ the summations in , or > • • • , *'a), extend to all 

the ^^^ombinations out of (1, • • • , n). 

The following conventions concerning the binomial coefficients are made: 

(o) ~ ^ ^ o, < h or if 5 < 0. 

It is a fundamental theorem in the theory of probability that, if Ei , • • • , E^ 
are incompatible (or “mutually exclusive'O^ then 

pi(l, • • • , n) = Pi + • • • + Pw • 

When the events are arbitrary, we have Boole*s inequality 

Pi(l, • • • , n) ^ Pi + • • • + Pn . 


Gumbel^ has generalized this inequality to the following: 



» C. R, Aead. Sc, Vol. 205(1937), p. 774. 
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fot k as 1, . • • , n. The case A: — i gives Boole’s inequality. FrSchet^ has 
announced that Gumbel’s result can be sharpened to the follovrmg 


( 1 ) 


Ain-i = 


• • • . fk+l) 





for A; — 1, • • • , n — 1. Thus, Ah is non-increasing for k increasing. On the 
other hand^ Poincar4 has obtained the follovdng formula which mcpreases 
Pi(l, ■ • . , n) m terms of the /S,’s, 

Pi(l, • • • , n) = 2 + SPri.,„ - ••• 

( 2 ) ^ 


In the present paper we shall study the more general function Pm(vi 
as defined above. First we generalize Poincare’s formula and Fr^chet’s inequali- 
ties. In Theorem 1 we establish (for 1 g m S n) 


(3) 


Pm(l, 


+ (”* 2 0 + ••• + - i)p‘ •••" 


Although this result is well known, we prove it in preparation for Theorem 2. 
Theorem 3 establishes 


(4) 


A (m) _ 2p«(yi , • • . , Vk+l) ^ Sp«(l'l , •••,»*)• 

/ ■ V ^ > V 

/ n — m \ in —*in\ 

+ 1 — m) \]c — m) 




for fc = 1 , • ' • , w — 1 and 1 ^ m ^ k. 

Next, we extend the inequalities (4), and in Theorem 4 we show that 

(5) Ar*’ g -h ASO ; 


which states that the differences A * — A^+i (A: *= 1, • • • , n — 1) are non-decreas- 
ing for increasing k. From this and a simple result we can deduce (4). Also 
Theorem 2 establishes that 


( 6 ) 


’f S fw<l, •••,»)* 


4-0 \ * 



• Loc. cit., Vol. 208(1939), p. 1708. 
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for 21 + 1 ^ n — m and 21 ^ n — m respectively. These inequalities throw 
li^t on formula (3) and are sharper than the following analogue of Boole’s 
inequality for Pm(l, , n), which is a special case of (4): 

(7) p«(l, , n) g Sp,,...,,. 

The last statement will be evident in the proof. 

In Theorem 5 we give an “inversion” of the formula (3), i.e. we express pi...» 
in terms of the p»(j'i , • . • , i't)’s, as follows: * 

(m — “ 2pm(>'l, ••*,»'») — • . • , »'m+l) + " • . 

(8) + (-ir-^P^d, -.-.n) 

= £ ("“1)^ Pmit'l, • • • , J'm-H)* 

«-0 

This of course implies the following more general formula for poi . ar » 

(m - l) ^ S ^ 

where («!,•••, ar) is a combination of the integers (1, • • • , n) and where the 

second summation extends to all the { . )-combinations of («!,•••, ar). 

+ %/ 

Since it is known® that we can express other functions such as Sr , 7 >[mi • Mri 
terms of the ' we can also express them in terms of the , • • • , 
provided r m. 

Finally, for the case m = 1, we give in Theorem 6 an explicit formula for 
Pfi...r] in terms of the pi(i'i , • • • , i'*)'s, as shown in (9), 

== — Pi(r + 1, • • • , n) + ^ • • • , 

VI 

- Spidiji^.r + l, ... , n) + . ■ • 

+ (— ,r,r + l,”- ,n), 

52 (— 1)‘~* 52 Pidi, • • • , n, r + 1 , . . . , 

where (i^i , • • • , Pi) runs through all the ^0-combinations from (!,*••, r). 
This of course implies the following more general formula; 

r 

“ 22 52 Pldl, . • • , Vi,ar+1, . . • , an), 

»-»l (Pit* •*.»<) 

*'^Pr6chet, '^Condition d’existence de systemes d'6v6nement8 associ^s k oertaines 
probabilit^s,^' Jour, de Math., (1940), p. 61-62. 
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where (ai , • • • , a, , • • • a,) is a permutation erf (1)' • • • ,n) and where 

{vi, ••• ,P{) runs through all the ^^-combinations out of (oi , • • • , Or). Fijc^ 

Theorem 6 and two lemmas we deduce a condition of existence of systems (d 
events associated witli the probabilities pi(vi , • • • , vm)- The author has* not 
been able to obtain similar elegant results for the general m. Probably they 
do not exist. 


2. Genenilization of Poincar6’8 formula; Generalization and sharpening of 
Boole’s inequality. 

Theorem 1: 

• • • ) W) = ^ P,, • - .I.,, ^ P'l •• •'•i+I 

+ (”* J EPri + - 

Proof: We have 


(10) 


p^il, . . . , n) = PiM.- 

6-0 


•Mw-Phl 1 


where the second summation extends, for a fixed b, to all the 
tions of (1, • • • , n). Further we have 



combina- 


(11) ~ ^ X/ • 'Pfn+c+dl 

d-0 

where the second summation extends, for a fixed d, to all the 

combinations of (1 , • • • , w) — (pi , • • • , I'm+c). The formulas (10) and (11) are 
evident by observing that the probabilities in the summations are all additive. 
Now we count the number of times a fixed - appears in (3), By (11) 
tliis is equal to the sum 



(m + h\ ( m\( m + b\ . /m + l\/ m + b\ 

\ m ) ViyU + l/"^\ 2 A^ + 2/ ‘ 




since this number is the coefficient of (—1)”*" in the expansion of 
(1 - = (-I)-"*" (1 - xf. 


Thus by (10) we have (3). 
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Thbohbm 2: For 2{ ^ n — m and 21 ^ n — m respectively, toe have 


( 8 ) 


imid \ I / \ t / 


Proof: By the reasoning in the previous proof, it is sufficient (in fact also 
necessary) to show that 

*1 / 1 / i_ i.\ iW 




Since 


/m — 1 + i\/in + ^ {m + b)\ /b\ 1 

\ i )\m + i) (m — 1)! h\ \i/ m + i 

is an integer, it is sufficient to show that 

(12) x:(-i)(^) 1 >0. 1 go. 

^•4) V / + t immO V / -f- t 


b — i 

Suppose ft > 0 is even. For i ^ ft/2 — 1 , we have - 

^ -h 1 


> 1 so that T 


b — i 


i+ 1 


t + 2 
t + 1 


Also 


m + i 


“j” i *4" 1 t Hh 2 


^ V - i"— i for m ^ 1 . Hence 


( ^ \ ^ =s ^ ^ ^ m + i /ft\ 

\t + 1/ m + i + 1 i+lm + i + l \i/ m + i 

^ ±2 (h\ 1 ^ /ft\ 1 

1 + 1 ^ + 2 \v m + i \i/ m + f 
For i S ft/2 we have ^ ^ < 1 so that ^-r— ; < 1 and 

(. ^ "j—J 

\i + 1/ m + 2 + 1 \v m + i 
Thus the absolute values of the terms of the alternating series 


Mfb 


61 


V — D* 

\i/m + i (m + ft)!(m — 1)! 

are monotone increasing as long as ^ g ~ — 1, reaching maximum at ^ == ^ and 

2 2 

tlien become monotone decreasing. 

Therefore (12) evidently holds for 21 S 6/2 and 2f + I g 6/2 respectively. 
For < S I + 1 we write 

ra '■ ' \t/ m + * (m + 6 )!(to — 1)! ^ ' \i/ m + i 


61 


.-<+1 

5-(-l 


(m + 6)!(m-l)l SS '' ^ W»n + fc-i‘ 
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From the above and the fact that 


61 


1 


we see that the 


(m + 6)!(m — 1)1 m + b 
righthand side is an alternating series whose terms are non-decreasing in absolute 
values. Hence (12) is true. 

If 6 is odd, the case is similar. 


3. Generalization of Fr6chet’s inequalities and related inequalities. Before 
proving our remaining theorems, we shall give a more detailed account of 
the general method which will be used. In the foregoing work we have al- 
ready given two different expressions for the function pmOt • • • , w), namely, 
formulas (3) and (10), but they are not convenient for our later purposes. 
Formula (3) is inconvenient because it is not additive and because the 
are related in magnitudes; while formula (10) has gone so far in the separation 
of the additive constituents that its application raises algebraical difficulties. 
Let us therefore take an intermediate course. 


Let each ^^^-combination (vi , • • • , Vm) out of (1, • • • , n) be written so that 

vi < <• * •< Pm . Then we arrange them in an ordered sequence in the 

following way: the combination (vi , • • • , Vm) is to precede the combination 
(mi , • • • , fim) if, for the first Vi ^ pi , we have Pi > pi . After such an arrange- 
ment we symbolically denote these combinations by 


I, II, 



Further, all the ^^ycombinations out of (I'l , • • • , Pk) where the latter is a com- 
bination out of (1, * • • , n) are arranged in the order in which they appear in 
the sequence just written. For example, all the ^^^-combinations out of 
(1, 2, 3; 4) are ordered thus: 

(12) (13) (14) (23) (24) (34). 


Let U denote a typical combination (mi , • • • , Mm). By JEc; we mean the com- 
bination of events , • * • , Efx^ so that pv = - general, let the 

combinations f/i , • • « , Uh^i , Ub be given, then Pui -^u^^j vt denotes the proba- 
bility of the non-occurrence of t/i , - . • , Uh^\ and the occurrence of Ub . 

Now let I, II, • • • , - 1 j = Y, ^ denote all the ^^^-com- 

binations out of (i^i , • • • , Pk) in their assigned order. We have 


(13) Pm(Pi , ■ • • , J'fc) = P/ + Prii + Pi* 12 ' III + • * • + Pi*>-»r'z . 

This fundamental formula is evident. Of course it is possible to identify the 
p’s on the right-hand side with the ordinary pr{...ry’s, but we shall refrain from 
so doing and be content with the following example: 

pj(l, 2, 3, 4) = pu + pu^s -f- ptt'8'4 + PVU + Pl'U'i + PwH • 
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T^bobem 3. For k = 1, • • • , n — 1 and 1 ^ m ^ h we home 

(ic - m) ‘ ^ 4- 1 r • • • > 

Proof. Substitute (13) and a similar formula for A; + 1 into the two sides 
respectively. After this substitution we observe that the number of terms is 
the same on both sides, since 

( n — m\/ ^ + n — m 

k — m/\k + l/\ m /'“\fc + l — m)\k/\m/' 

Also, the number of terms with a given U — (mi > • • * , Mm) unaccented is the 
same, since 

( n — n — m n m 

k — m/\k + 1 — mj \fc + 1 — m/\k — m/ 


Ijet the sum of all the terms with U unaccented in the two summations be 
denotfed by <rk+i == <rk+i (mi , • * • , Mm) and = <r* (/ui , . • • , Mm) respectively. It 
is, sufficient to prove, that 


( n — m\ ^ ( n — m \ 

k -m) ‘'*+‘ ^ V* + 1 - 


k — m) form - - where 

0 ^ i ^ Mm — w and where (vi , • • • , , Mi , • * • , Mm) is a ^^combination 

otft Of (1, • • • , Mm). For fixed (mi , • • • , Mm) and a fixed Z but varying X’s, o'* 
contaiiis ^ terms of the form ’ exactly I accented 

subscripts. Let the sum of all such terms be denoted by ak\ Evidently trl^^ 
terms. As a check we have 

/n - Mm\ /Mm - / n - 

\k-mj\ 0 / ^ - l/\ 1 • 

/n - MmWMm - m\ ^ /n - m\ 
\fc - Mm/ \Mm - mJ \k - mJ* 

which is the total number of terms in ark • 

We decompose these p^a partially, as follows: 

where (ri , • • • , vy^ , m , • • • , Muh*) “ a permutation of (!,•••, m«.) and where 
the second summation extends, for a fixed b, to all the ^Mn* w ^^-combinar 
tions out of (1, V • , M») — (ri , • • • , , Ml » • • • M..). 
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Now consider a given 

Pp{ • • -piM • • 

where 0 ^ t ^ tim — m and (pi • • • pAi • • • X«mi ■ ■ ■ M») is & permutation of 
(!,•••, Mm)< It appears times in Hence it appears 

/» — l*m\ / A 4./ — Mm + t\ 

w- lAV'^ ''VAi-m-t/V/ \ *!-«» / 

times in ok . 

Therefore to prove (14) it is sufficient to prove that 

/n — m\/n — + t\ / n — m \/n — Mm+A 

\k — m) \A: + 1 — mj ~ \fc + 1 — m/ \ k — m )' 

By an easy reduction we have 

(n — Mm + t — fc + w) g n — fc 


or 

— Mm + < + g 0; 

since t ^ ii„ — m this is obvious. 

Theorem 4: For 2 ^ k ^ n — 1 and I ^ m ^ k we have 

SpmCl'l I • • • ) "*) .e- 1 ^VnAvi , • • • , Vk-\) , 1 2p*(l'i , • • • , V»+i) 

(n-m\ -2 [n-m\'^2(n-m\‘ 

\k — m) \fc — 1 — TO/ \fc + 1 — to/ 

Proof: By the reasoning in the previous proof, it is sufficient to show that 

_/ n — TO \/ n — TO \/n-Mm + A 
\i: — 1 — m)\k + 1 — wi/ \ k — m / 

/n-mX/ n — TO \/n — Mm + A 
“ \A; — mj\k + 1 — Tn/\k — 1 — to/ 

/n — to\/ n — m \/n — Mm + A 
\A: — m/\k — 1 — mj\k + 1 — to/’ 

for 0 ^ ^ Mm — TO. By an easy reduction this is equivalent to 

2(n — k)(n — Hm + t — k + m + l)^{n — k + l)(n — k) 

+ (n — nm + t — k + m+ l)(n — Mm + t — A: + »») 

or 

(n-tim + t- k + m + 1)(m» - t - m) ^ (n - k)(jxm - t - m). 

For < = Mm — n* we have equality, otherwise we have 

•“Mm + t + TO + 1 SO. 
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We can deduce Theorem 3 from Theorem 4 and the following result (a case 
of generalized Gumbel inequalities): 

(16) I ‘ ^ • • • , Vn_l). 

Pboof of (16): Substitute from (13). Consider the p’s with U unaccented. 
The number of such terms is the same on both sides. But on the left-hand side 
they are all the same pvw-.iv-iyv , while those on the right-hand side, being of 
the form pv[--.viv where 0 S X ^ f/ — 1 and (Ui , • • • , 170 *s & combination 
out of (1, •••,{/ — 1), are greater than or equal to it. Hence the result. 


4. The p.,.a/8 in terms of the Pm{vi, ••• , yjt)'B and the p[afa<]*B in terms 
of the pi(yi, . ■ • , »-*)’s. 

Theorem 6: For 1 ^ m ^ n we have 

(m - i)p*' ” Sp»(*'»» • • • » O - £pm(»'l, • • • , Fm+l) + • • • 

(8) +(-l)“^p-(l, ••*,»») 

“ 2 (~1)* 2 p».(»'l, •••, »»«+<). 

<-0 

Proof: As in the proof of Theorem 3, consider <rjb(Mi , • • • , M«)* Here 
tn ^ k S n. Since a given 


(16) f 

( n — Mm + A X- -x 

^ _ m ) m <r* , it appears 




if n — Mm + < ^ 1, 
if » — Mm + t ■= 0. 


times on the right hand side of (8). Hence for fixed (/»!,•■•, Mm), the only 
p’s of the form (16) which actually appears are those with t = Mm — n. But 
Mm ^ », thus « = 0, Mm = n, and (Xi , • • • X, , /ti , • • • , Mm) k a permutation of 
(!,•••, n). The term in question is therefore pi...n . Since the number of 

("^combinations of (1, ... , n) with Mm = n is , we have the theorem. 

Theorem 6: For l^r^n — 1, m have 

Pn -r] * - Pi(r + 1, • • • , n) -b 23 r -b 1 , . . . , n) 

•'I 

— 2: p(»'i> >^> >■ + 1, • • • , n) + • • • + (— 1)’^*2 pi(1> • • • » ”) 

M.'J 

= £ (-1)*~‘ 23 Pi(fi, ••• ,V<, r-b 1, ••• , n), 


( 9 ) 
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where (vi , • • • , k<) rune through aU the ^^J-combinaHone out of (1, ^ , r). 

Pboop: We rewrite (14) for the special case m = 1, 

(17) Piiui ) • • • » M») “ P*., + P*i,'mi + • • • -h Pul -n-m > 

where ui < u* < • • ■ < Uh • Substitute into the right hand side of (9). After 
the substitution let the sum of all those p’s with u unaccented be denoted by 
. The terms in are of the form Pm1.- «;-im where 1 ^ s ^ M and 
iui, ••• , M*-i) is a combination out of (1, • • • , ^ — !)• 

First consider a fixed m S r. For a fixed Pni-y.-m we count the number of 
times it appears in 9^ , that is, on the right hand side of (9). This is evidently 
equal to 


Z (-1)' 




0 , 

1 , 


if r - M S 1, 
if r — M == 0. 


Thus the only terms that actually appear are those with m = r; and each of such 
terms - appears exactly once with the sign (—I)*. Hence their total 
contribution is 


(18) Pr — ]C Vt^[r + S — * • * + ( — 1)*^ * Pi'- • • (r-l) 'r == Pl---f, 

PI Fl.rj 

by an easy modification of Poincare's formula. 

Next consider a fixed p ^ r + 1. Every term with m unaccented in is of 
the form (with the usual convention for /i = r + 1) p^j. - m; (r+i)'. vm-d'm > where 
(/ii , • • • , yt,) is a combination out of (1, • • • , r); and it appears exactly once 
with the sign ( — 1)*. Their total contribution is therefore 

““ P(r+l)'*-(M-l)V + 2 Ppi(r+1)'--(M~1)'M “ 2 PpiPa(r4-l) ' • ' * (m-D > + * • • 

Pi Pl.pa 

+ (“”1)*^^P1'---(m+1)'m ~ ” Pl‘-T(r+l)'---(M-l)V> 


by another application of Poincare's formula. Summing up for m = 
r + 1, • • • , n, we obtain 


(19) — (Pl...r(r+1) + Pl...r(r-|-l)'(r-|4) + •’• + Pl...r(r+l)'...(n-l)'n). 


Adding (18) and (19), we obtain as the sum of the right-hand side of (9) 


Pl...r (Pl...r(r+1) + Pl---r(r+l) '(r+2) -f" * • • + Pl...r(r+l)'...(n-l)'n) 


by an easy modification of (17). 


= Pl...r(r4-l)'(r+2)'..-n' = Ptl---r] 


5. A condition for existence of systems of events associated with the proba- 
bilities pi(i'i , • • • , Vk)^ 

Lemma 1 : Let any 2'* — 1 qmntitiee q{ai j ••• j ak) be given, where k « 
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1, • • • , n, and for a fixed A;, (ai , • • • , a») rune through aU the (^^■^tombinaiiona 
out 0/ (1, • • • , n). Let the quantiUea Q(ai , • • • , at) be formed asfoUowe: 

QiO) = 1 -g(l, 

Qiai, • • • , at) = — g(at+i, * * * , a») + 2 “k+i> *'•>«») 

- £ at+i, •••,«») + ••• + (-l)*^‘g(l, •••,»), 


where (ki , • • • , v<) runs through all the ^combinations out of (1, • • • , n) — 

(at+i ,•••, On). Then the mm of all these Q’s is equal to 1. 

Proof: Add all these Q’a and count the number of times a fixed qim , , ut) 

appears in the sum. For 1 ^ A: ^ n this number is equal to 

Hence we have the lemma. 

Lemma 2: (Fr4chet) Given 2" quantities Qtai ...ari where (ai , , ar) runs 

through all combinations out of (I ^ . • . , n) including the empty one. The necessary 
and sufficient condition that there exist systems of events E\ ^ , En for which 

P(«l-arl =* Olai—Ofl 

(where pioj denotes the probability for the non-occurrence of J?i , • • . , J?n) is 
that each 0^0 and that their sum is equal to 1. 

Proof: Since the probabilities P[aj. .«rl independent, i.e., unrelated in 
magnitudes except that their sum is equal to 1, the lemma is evident. 

Theorem 7: Given 2” — 1 quantities q{ai , * as in Lemma 1, the, neces- 

sary and sufficient condition that there exist systems of events Ei , ... ^ En for which 

Pi{(Xi , • • • , Qfifc) = q{ai 

is that for any combination (ar+i , • • • , an), 1 ^ r ^ n — Ij out of {1, •••, n) we 
have 

- • • • , an) + £ > «r+i , • • •, an) ~ Z) ^Ca^j , a., , ar+i, •••fan) 

+ ••• +(-l)'"‘g(l, •••,n)^0, 

and thus 


1 — g(l, ••*,«.) ^0. 

Proof: The condition is necessary by Theorem 6. It is sufficient by Lemma 
1, 2 and an obvious formula expressing pi(ai , • • • , a.) in terms of the P{,j...,f 3 ’s. 



NOTES 

This section is devoted to brief research and expository articles^ notes on methodology 
and other short items. 


A NOTE ON SHEPPARD’S CORRECTIONS 


By Cecil C. Craig 


University of Michigan 

As far as the author is aware, H. C. Carver was the first to point out that 
while the formulae ordinarily given for Sheppard’s corrections for central mo- 
ments are valid for moments computed about the population mean, there are 
still systematic errors present when they are applied to central moments calcu- 
lated from any particular grouped frequency distribution [1]. This is due, of 
course to the fact that the mean of a grouped frequency distribution is in general 
difiFerent from that of the distribution before grouping. For a fixed class interval 
Sheppard’s corrections give the average value of a moment about a fixed 
point of a given order for all the groupings of this class width possible and will 
fail to do so if the moment in question is calculated for each position of the class 
limits about a point which varies as the class limits shift. Thus Carver [1] 
pointed that the commonly used formula (for a continuous variate). 


( 1 ) 


— V2 


12 ’ 


should, if V 2 is calculated about the mean of the grouped distribution as it is in 
practice, be replaced by 

(2) M2 = 

in which is the variance of the means of group)ed distributions over all posi- 
tions of the class limits with the fixed class width k. 

Recently J. A. Pierce [2] gave a method for deriving the required formulae of 
the type of (2) and gaye actual formulae for both moments and seminvariants 
through the sixth order. It is the purpose of this note to point out that the use 
of moment generating functions provides a more elegant and concise way of 
arriving at formulae equivalent to Pierce’s though in a somewhat different form. 
This method can be immediately extended to distributions of two or more 
variates. 

In a previous paper [3] on Sheppard’s corrections for a discrete variate, the 
author made use of the following argument: It is assumed that for a fixed class 
width A;, any point in the scale on which the variate x is plotted is as likely to be 
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chosen as a class limit as any other; choosing a system of class limits for grouping 
the data is then equivalent to placing at random on the 2 ;-axis a scale with 
division points at intervals of k. Once the system of class limits is chosen any 
value of X before grouping bears to the class mark^ , of the class in which it 
falls the relation, 


( 3 ) 


X.- = a: + €, 


in which x and e are independent variates. The frequency law governing x, is, 
of course, that of the population from which it is drawn while € is distributed 

( k k\ 

— ~ ~ ) for a continuous variate 
and ^ kf — ^ if m consecutive values of a discrete variate are 


grouped in each class interval. In either case 


(4) 

in which is the moment generating function of the variate x* , etc. The 

expansion of both sides of (4) in powers of gives the relations between the 
average values of moments of the grouped distribution over all positions of the 
scale and the moments of the ungrouped distribution from which Sheppard's 
corrections are obtained by solving for the moments of the ungrouped distribu- 
tion. The relations are valid for any fixed point about which the moments are 
computed; if this fixed point be taken as the mean of the ungrouped distribution 
the ordinary Sheppard's corrections for central moments result. 

But it is quite easy to modify (4) to give the necessary relations in case the 
moments of each grouped distribution are computed about the mean of that 
distribution. We have only to write 

(6) x< * x< — f ^ 

in which 2 is the mean of the grouped distribution for which x< is one of the class 
marks. Then 

MxiW = w) 

( 6 ) 

= MxWMM) * 

If we write, 

in which is the product semin variant of order rs of moments about the means 
of the grouped distributions and of such means, the expansion of the logarithm 
of the second member of (6) gives 

(7) 1 4* (Xio + + (Xjo + 2Xu + Xoa) ^ + (Xjo + ' ' 
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in which 

(\m + Xci)**^* * Xh) + + • • • + (0 + • • • + )w • 

The expression of the logarithm of the right member is*: 


( 8 ) 


Xi«> + X* 


1* 

21 



+ 


+ i. (“ir‘ 


SJk** 

2s 


fl--") — 

\ m**/(28)!’ 


for a discrete variate (the result for a continuous variable is obtained merely by 
letting TO — ♦ <» ) in which is the rth seminvariant of the ungrouped distribution 
and B, is the sth Bernoulli number. 

We may without loss of generality take the origin for x at the mean of the 
ungrouped distribution so that Xi = 0. Further it is easy to see that 


Consider 


hr = 0, r = 0, 1, 2, 3, • • • . 

E[(X{ - £)£'] = Pir 


For a fixed £, i.e., for a given grouping, this becomes 

£''E{Xi - f ) » 0 




Then since Pu is the average of this over all groupings with a given class interval, 
Pir — 0, and from the expression for Xir in terms of the moments P.-,- it is obvious 
that also Xjr = 0. 


Then we must also have Xoi = 0 as is otherwise obvious and (7) can be rewritten 


(9) 


1 + (X»o + Xoj) + (X 30 + 3X»i + Xm) ^ "I" • • • • 


Now from (8) and (9) by equating coefficients of like powers of tf, we get the 
set of formulae: 

Xi = 0 

(10) Xs * Xao + 3 X 21 + ^ 

X4 == ^ + 4 X«i + 6X28 + X04 + 

\ WU/ 12U 


These formulae, however, do not give the sought Sheppard's corrections for 
seminvariants calculated from grouped distributions of a discrete variate. See 
below. 

Referring to formula (10), p. 58 of the author's paper cited [3 ], it is easily seen 
by comparison that the required moment formulae are obtained from the general 
formula 

(11) Mn = ^ ai,(l»io + 
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in which <)«« is given by fonnula (9) of this former paper. For n » 1, 2, 3, 4 
we write down immediately 




fit 

(12) w 


M4 


0 (Pu “ »B1 = 0) 

f'm + 3l>ii + Am 

V40 + 4p»i + 6Pa + Am 


■ (' " s) I"” + >«> I + (i - ^.)(7 - ^.) 


In these formulae, Ahi is, of course, the average value of rth central moments 
about the means of grouped distributions. From the definition Pr,(8 0) is the 

average value of the product of the rth central moment of a grouped distribution 
by the sth power of the mean of the same grouped distribution. Also, it must 
be noted that in the formulae (10) the K/s there are to be calculated by the 
usual formulae from the moments. A.-,- , and are not themselves the average values 
of like seminvariants calculated from the separate grouped distributions. Thus 
though the formulae (12) give the sought Sheppard’s corrections for moments, 
the formulae (10) do not do the like for seminvariants in general. However, 
since in each grouped distribution, 


X, = V* 


{md 


X» = K» 

we have, taking the expectation or average value over the grouped distributions. 


= Am = Xm 

and 

F(X») = E{vt) = Am = ^ , 

and the first two formulae of (10) do give the Sheppard’s corrections for Xj and 
Xs calculated from grouped distributions of a discrete variate. 

But the case for X4 is different. In each grouped distribution, 

X 4 = V4 — 3i4 , 

and if we define Ir by 


E(M = Ir, 
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we have 

*= 1^40 ~ 3E(vl) 

*« p|0 — 3(|^0 + « X 40 ““ 3l^:r, , 

if i»ijr, is the variance of in the grouped distributions. 

In a s im i l a r way one can obtain such formulae for seminvariants as may be 
required. Through the sixth, the formulae for the Sheppard’s corrections for 
the seminvariants calculated from a grouped distribution of a discrete variate are: 

Xs *= Zj + 3Xji + 


(13) ^ ^ 

X| = U + + 5^1 + 10Xs2 4* 10 X 28 + Xo* 

X« = Ze + 15 pu;»<i,» 4 4“ 10v2>i — 30 v8:»», — 90j'2:»i j'ao 

+ 6 X„ + 15X« + 20^ + 15Xm + ^ - (l - 

\ m®/ 252 

In these formulae, is the ijth central product moment of Vr and r, in the 

grouped distributions. 

To illustrate these formulae numerically and to facilitate comparison with 
Pierce’s results, we will use the example he chose. His imgrouped distribution 
was: 


V 

f 

V 

f 

V 

f 

1 

2 

4 

30 

7 

1 

2 

8 

6 

4 

8 

1 

3 

10 

6 

3 

9 

1 


From this the following three grouped distributions with A; » 3 can be formed: 
( 1 ) ( 2 ) ( 3 ) 


ciaBB 

f 

class 

f 

class 

f 

1- 3 




-1 i-i] 

2 

4- 

37 




48 

7- 

3 

6- 


6- 

8 

10-12 

0 

9-11 


8-10 

2 
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With ori^ at v >> 4, we have the following table of moment characteristics 
of these four distributions: 


Distribution 


pt ■■ x* 

Pi ■■ X| 


X* 

*'■' - - (-s) 

(1) 

9 

9819 


238849817 

50388966 

19 

60 


60* 

60* 


60 

(2) 

0 

10179 




1 

60 

60* 

60* 

1 60* 

60* 

60 

(3) 




528282000 

294904800 

20 




60* 

60* 

60 

Average 

10 

9606 

622440 

441657198 

163839996 


60 

60* 

60* 

60* 

60* 



t 

Ml 

M2 “ Xl 

Ml ■■ Xi 

M4 

X4 


Original 

Distribution 

SIS 

1 

7460 

60* 

642400 

60* 

305034000 

60* 

138079200 

60* 



From the table, 


ho =* Xao 

9606 

60* 

Pio ^ 

622440 

60* 

P40 = + 3Xso = 


441657198 

60* 


We further compute: 




i>D4 


mSp'iY _ 254 _ 5 
3 60* ^ 


-380 . 

"6^ 

96774 

60< 


— Pwi’u 


-72978 

60* 


Ou 


ho 

Pn 


_ 6780 _ ^ 

3 60* ” 


8705412 

60* 

2360946 

60* 


Xjj 


■* Poi — 3 Pm 


96774 

6 ''* 
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»»:rt • 


?!!*- ^.330948 
3 ^ ^ 0 ^" 


ii «= X40 — • 3 ys:rj, « P40 — 3 Pso — 3 i<i:», 


(‘ 

(■ 


l\£ 2 

nV 12 3 


240 


2 . 


163839996 

60* 


With these values one may check the formulae (12) and (13) as far as weight 
four. For example: 

^ 9606 254 _ 2 ^ ^ 

'** 60* 60’ 3 60’ 

X 4 = ^ (163839996 + 991494 - 34821648 - 437868 - 96774 + 8640000) 


138079200 

60* 


It may appear at first glance that since 

Pr. = EMSpiY] 

and could be expressed by means of the notation, pu:,^,vY, the notation in ( 12 ) 
and (13) could be made more uniform. It could be but at the expense of greater 
complexity in these two sets of results. Moreover, it is convenient that Xr, 
is expressible in terms of in precisely the same way that product semin- 
variants are ordinarily expressible in terms of product moments. 

Pierce's results differ from the above not only in their mode of derivation but 
also in the fact that they express Pro’s and Vb in terms of the characteristics of 
the uhgrouped distribution and moments and seminvariants of moments in the 
grouped distributions. Thus as they stand they are not formulae for Sheppard's 
corrections. 

Finally it must be remarked that in comparison with the usual formulae for 
Sheppard's corrections, the formulae ( 10 ) and (13) introduce quantities the 
magnitudes of which are not known in general except that ordinarily they are 
quite small. It is hoped that results on this point will be forthcoming soon. 
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ON THE ANALYSIS OF VARIANCE IN CASE OF MULTIPIB 
CLAS^CAHONS WITS UNEQUAL CLASS FREQUENCIES 

Bt Abraham Wald^ 

Columbia Unwertity 

In a previous paper’ the author considered the case of a single criterion of 
classification with unequal class frequencies and derived confidence limits for 
where <r'’ denotes the variance associated with the classification, and a 
denotes the residual variance. The scope of the present paper is to extend 
those results to the case of multiple classifications with unequal class frequencies. 

For the sake of simplicity of notations we will derive the required confidence 
limits in the case of a two-way classification, the extension to multiple classifica- 
tions being obvious. 

Consider a two-way classification with p rows and q columns. Let y be the 
observed variable, and let n<, be the number of observations in the fth row and 
jtb column. Denote by the Ath observation on y in the tth row and jth 
column (A: = !,•••, n<,). Let the total number of observations be N. We 
order the W observations and let ya be the ath observation on y in that order. 
Consider the variables: 

t) tl , • • • , , • • • yVq > 

and denote by the ath observation on t, by Ua the oth observation on U and 
by Via the oth observation on Vj . The values of » Ua and Vja are defined as 
follows: 

= 1 (a = 1, • . . , N), 

= 1 if ya lies in the ith row, 

Ua = 0 if j/a does not lie in the tth row, 

Vja = 1 if j/o lies in the jth column, 

Vja — Oil ya does not lie in the jth column. 

We make the assumptions 

yij — Xif + *i Vi I 

where the variates Xif, a, Vi — L • • • > PI J = !> • • • 1 9! ^ = L • • • > «</) 
are independently and normally distributed, the variance of is a, the vari- 
ance of d is O’'’, the variance of Vi is o'"’, and the mean values of d and Vi are 
aero. 


* Research under a grant-in-aid from the Carnegie Corporation of New York. 

' “A note on the analysis of variance with unequal class frequencies,” Annals of Math, 
Siat., Vol. 11 (1940). 
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Let the sample regression ot y on t, U, ••• ,tp^ , vi, be 

y " of + biti + • • • + + ditii + • . • + , 

We want to derive confidence limits for 

Let us introduce the notations: 


a 

« Ooi 


(t - 

1, 

... ,p- 

1), 

iciliia 

= Oop-i+y 


u - 

1, 

... ,q - 

-1), 

E Uia 

= Oii 


(*, 3 = 

1, 

• • • > P " 

1), 


= aip-i+y 

(*■ “ 1, • • • , P 

li 

1 

1, 

... ,q- 

- 1), 

E 



(*, 3 = 

1, 

... ,q - 

- 1). 

II 

Ci,ll = l|a.-yir 

(b J 

= 0, 1, .. 

'• ) 

p + g - 

- 2). 


Let the regression of xlf on t, ti , • ■ • , tp-i , t>i , • • • , be 

X = a*t + bih + • • • + bp-itp^i + d*»i + • • • + d*-iVf-i , 

The regression of «< + m on the same independent variables is evidently equal to 

€l<i + • • • + + • • • + Vifit 

— (Vt + tp)i + (ei — tp)ii + • • • + (tp-i — tp)tp^ 
+ (iji ~ + • • • + ivt-i — , 

since tp = t — h — ••• — <p_i and f — vi — • • • — v^i . Hence 

(1) = h? + (ej — tp), (f = 1, . . . , p — 1), 


and therefore 


“ (fbib* ^ *p) “■ Ciflf ^ 


= ICii + (1 + 8</)XV*» (t> i *= Ir • • • . P “ 1)» 


where 5,-/ is the Kronecker delta, i.e. 5<y = 0 for i ^ j and Sa = 1. Denote 
Cii + (1 + S<y)X’ by c<,- . Since the expected value of b* is equal to aero, on 
account of (1) also the expected value of &< is equal to aero. Let 

II 9*i II *= II Cii ir\ (*, j - 1, • • • , p - !)• 

Then 


(T §mpl M 


( 3 ) 
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has the xMistribution- with p — 1 degtees of freedom. The expression 

(4) i E (y. - K.)V 

V aaal 

has the x’-distribution N — p — q I degrees of freedom. The expressions 
(3) and (4) are independently distributed. Hence 

/-N JV — p — g + 1 XXgtibibf 

P-1 s(p. - Y,r 

has the /’-distribution (analysis of variance distribution). We will now show 
that (5) is a monotonic function of X^. It is known that liXg{,hibi is invariant 
under linear transformations, i.e. 

XXgiMi = ^^g'tib'ibi , 

where is an arbitrary linear function, say (tabi + • • • + Wp-i&p-i of 6 i , • • • , 
6^1 (»■ = 1 , • . . , p - 1 ) and 

II (70 11 = ii«»it»jir. 

We can choose the matrix || || such that 

«< = Wi(«i ~ *»>) + •••+ — tp), (i = 1 , • • • , p — 1 ), 

are independently distributed and a\\ — «r'’. The coefficients pn of course do 
not depend on a'. We have 

= VbYbY + (5« = Kronecker delta). 


Now let 


bi = viib'i -1- ... -I- , (v = 1, . . . , p — 1), 

where || vo U is an orthogonal matrix and is chosen such that bt", ••• ,bp'li 
are independently distributed. On account of the orthogonality of || vt,- 1| we 
obviously have 

vbt =• vb‘f" + ff'* ; ob'^b'/ = 0 for i ^ j. 


Hence 


(6) 


E E gM = 


«r*;" + X*(r*' 


The right hand side of (6) is evidently a monotonic function of X* which proves 
our statement. The endpoints of the confidence interval for X* are the roots 
in X* of the equations 

N -p - g + 1 IXgnbibi _ p . N - p - q + 1 XXgnbib,- _ ^ 

”■ p-1 p-1 2(i/.-y.)» 

where F* denotes the upper, and Fi the lower critical value of F. 
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m 


Hie derivation of the required confidence limits in case of claasififeatiiMM in 
mote than two ways can be carried out in the same way and 1 m^ly state 
here the results. 

Consider r criterions of classifications and denote by the number of classes 
in the «th classification (m = 1, ... ,r). Denote by the number of 

observations which belong to the t'lth cl^ of the first classification, ^th class 
of the second classification, * • • , and to the tVtii class of the rth claissification. 
Let yi*. . be the A;th observation on y in the set of observations belonging to 
the classes mentioned above (k ^ 1, ••• , ni, We make the assumption 


Vn-- 




**r ^ *»i 


+ • • • + 


where the variates 


f • • • J « 1, • • • , Pu I ■“ 1, • • • , f ; ft ■* 1, • • • , ^<|***Ol 

are independently and normally distributed, the variance of a:*-*!--*, ^ 
variance of is al and the mean value of is zero (iu = 1, • • • , p* ; 
w « 1, • • • , r). 

Let N be the total number of observations. We order the observations in a 
certain order and denote by the ath observation in that order (a « 1, • • • , N). 
Consider the variables: 


tf f (w s* 1, • • • , rj in « 1, • • • I Pn)f 

and denote by the ath observation on t and by tl^^a the ath observation on 
ti^\ The values of ta and are given as follows: 

^ s=s 1 (of = 1, . . . , N)f 

iiui « 1 if ya lies in the i^th class of the wth classification, 

* 0 if y« does not lie in the i^th class of the wth classification. 

Let the sample regression of y on t, be given by 

y = + 

Let the covariance of and 6;“’ be given by under the assumption 

that ffi = V* = • • • = ff, = 0. The matrix || II (*« » J« = L • • • » P« ~ 1) 
can be calculated by known methods of the theory of least squares. Let 

II g'ii II = II ci:i + (1 + kM ir^ = 1, . . . , p. - 1), 

where is the Kronecker delta and Xt = ai/a^. Then the lower and upper 
confidence limits for Xt are given by the roots in Xi of the equations 

N— ^Pu + r — 1 2 

' tin.- r.)’ 

a-1 


( 8 ) 


= Fi a = 1 , 2 ), 



350 


T. N. B. OKI1VIU.BI 


where Ft is the upper aod Fi the lower critical value of the analysis of variance 

r 

distribution with — 1 and iV — + 1 degrees of freedom. In 

tllMl 

case of a single criterion of classification the confidence limits (8) are identical 
with those given in my previous paper. 


THE FREQUENCY DISTRIBUTION OF A GENERAL MATCHING 

HIOBLBM 

By T. N. E. Grbville 
Bureau of the Census 

1. Introduction. This paper considers the matching of two decks of cards of 
Sjrbitrary composition, and the complete^ frequency distribution of correct 
matchings is obtained, thus solving a problem proposed by Stevens/ It is also 
shown that the results can be interpreted in terms of a contingency table. 

Generalizing a problem considered by Greenwood,^ let us consider the matching 
of two decks of cards consisting of t distinct kinds, all the cards of each kind being 
identical. The first or '‘cair’ deck will be composed of i\ cards of the first kind, 
it of the second, etc., such that 

+ ij + ts + • • • + = n; 

and the second or ^‘target’' deck will contain ji cards of the first kind, jt of the 
second, etc., such that 

ji+ jt+ • • • + jt = n. 

Any of the or/s may be zero. It is desired to calculate, for a given arrange- 
ment of the ‘‘call” deck, the number of possible arrangements of the “target” 
deck which will produce exactly r matchings between them (r = 0, 1, 2, • • • , n). 
It is clear that these frequencies are independent of the arrangement of the call 
deck. For convenience the call deck may be thought of as arranged so that all 
the cards of the first kind come first, followed by all those of the second kind, 
and so on. 


2. Formulae for the frequencies. Let us consider the number of arrange- 
ments of the target deck which will match the cards in the fcith, fcath, • . •*, A.th 
positions in the call deck, regardless of whether or not matchings occur elsewhere. 
Let the cards in these s positions in the call deck consist of Ci of the first kind, 
Cf of the second, etc. Then: 

Cl + Ca + • • • + Ci = s. 

The number of such arrangements of the target deck is 


( 1 ) 


^ (n - b)\ 

n Uh — Ch) I 

A«1 


^ W. L. Stbvbns, AnnaU of Eugenics^ Vol. 8 (1837), pp. 238-244. 

* J. A. Qbbbnwood, Anfia^ of Math, Stat,, Vol. 9 (1938), pp. 58^59. 
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For fixed values of the c’s, the 9 specified positions may be selected in 

(2) TT 

w c*!(t4 — Cl)! 

ways. 

Consider now the expression 


(3) 


V. 


(n — «)lll t»l 

-r- - 

H ci!(i* — Ci)!Oi — Ci)! 

hml 


obtained by summing the product of (1) and (2) over all sets of values of the 
numbers ci , C 2 • • • , c< satisfying the conditions: 

t 

0 ^ Ck^ikf Ck ^ jhf and 2 * s. 


Let Wt denote the number of arrangements of the target deck which result in 
exactly s matchings. Then it is evident that F, exceeds W» , since the former 
includes those arrangements which give more than s matchings, and these, 
moreover, are counted more than once. Consider an arrangement which 
produces u matchings, where w > «. Such an arrangement will be counted 
once in V, for every set of 8 matchings which can be selected from the total of 
u — that is times. In other words, 


V,^Wr + '^^CrWr+1 + '^CrWr^ + . • • + ”CrWn. 


It has been shown’ that the solution of these equations is 
(4) Wr=Vr- '^^CrVr+i + ^CrVr^t + (-1)"“' "0,7,. 


3. Computation of the frequencies. Equations (3) and (4) apparently give 
the solution of the problem, but in practice the labor of carrying out the sum- 
mation indicated in (3) would often be very great. However, (3) may be re- 
written in the form 


( 6 ) 


where 


hml 




H. Qbibinobr, Annals of Math, Slat., Vol. 9 (1938), p. 202. 
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It will be Been that H, is the ooeffioi^it of in the product 


( 6 ) 



A:!(u 


»*!;*! X* \ 


f 


where il denotes the smaller of u and j* . The factor U jkl was included in 

Ikml 

H, in order to make the coefficients in the polynomials of (6) always integers. 
Equation (4) may now be written in the form 


Wr^t. (- 1 )*" ‘Cr H„ 

ni»i 


or 

(7) 


— r)! 


Ili*! 
^1 


a form which lends itself to actual computation. 


4« Factorial moments. The factorial moments of the frequency distribution 
of the number of matchings are easy to compute. Let m« denote the sth factorial 
moment, so that 


23 r‘*> Wr 

(8) m. . 

T.Wr 

r«iiO 

Substituting from (4) 

E r‘*’Tr, = 23 (r‘*’ 23 (- 1 )““' “Crr«|. 


Reversing the order of summation and simplifying, 

23 r^'^Wr * 23 {«'*’ 23 ( - !)“■' ““*c« ^ = s 1 r. . 

fMt V. C"** 

Hence, 


n 


V. - E 

r-sO 


n! 


t > 



( 9 ), 



and from (5) and (8), 

( 10 ) 
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m. 



3G8 


5. Mean and variance. From (6) 

( 11 ) = 2 *kjh 

h"»i 

and 

(12) t*(u — i)i*(i* — 1) + S 

htk^l 

hftk 

Hence the mean number of matchings is 

£ **i* 

(13) = . 

n 

The variance ^ is 

if' ‘ 

nti + mi — ml = — n tk(*» — l)jkijk — 1) + 2n ^ ihikjhjh 

will — 1) L *-J 

+ n(n - 1) E Wa - (n - ihjk^ j , 

or 

(14) M2 = ^ S ihjhiih + ifc) + S . 

In the special case jii = = • • • — jt — j, these formulae become 

These formulae have previously been given by Stevens/ and those for the 
special case also by Greenwood. The maximal conditions for the variance, 
given by Greenwood for this particular case, apparently can not be put in a simple 
form for the general case. 


6. Unequal decks. Suppose the call deck contains m cards, m < n, and is to 
be matched with m cards selected from the target deck. It can be assumed 
without loss of generality that the first m cards in any arrangement of the target 
deck are the ones to be used. The formulae of this paper can be applied to this 


W. L. Stevens, Annalt of Evgenies, loc. cH., Psychol. Review, Vol. 46 (1939), pp. 142-160. 



364 


PAUL O. HOBL 


more general problem by the expedient of imagining n ^ m blank cards to be 
added at the end of the call deck and regarding these as an additional kind. 
It is thus apparent that formulae (13) and (14) apply without modification to 
this altered situation. 


7. Application to contingency table. Stevens^ has considered the distribution 
of entries in a contingency table with fixed marginal totals, and has pointed out 
that the problem of matching two decks of cards may be dealt with from that 
standpoint. A contingency table classifies data into n colunms and m row^s, 
and we may consider the row as indicating the kind of card which occupies 
a given position in the call deck, the columns having the same function with 
respect to the target deck. Stevens defines a quantity c as the sum of entries 
in a prescribed set of cells, subject to the condition that no two cells of the set 
are in the same row or column, and mentions as unsolved the problem of the 
exact sampling distribution of c. 

We now have at our disposal the machinery for solving this problem. Fol- 
lowing Stevens’s notation, let Ui , 02 , • *• , Um denote the fixed row totals and 
61 , 62 , • * • , &n the fixed column totals, while Xr» denotes the frequency of the 

cell in the rth row and the sth column. Then, let c = 23 ^rh»h > where I does 


not exceed either m or n. 


Imagine two decks of N cards 




the first containing ax cards of one kind, 02 of another, etc., and the second 
containing 61 cards of one kind, 62 of another, etc. Moreover, let the r^th kind 
in the first deck and the s^th kind in the second deck be the same kind {h = 
1, 2, • • • , Z), the other kinds being all different. Evidently c is the number of 
matchings between the two decks. Hence, the methods of this paper can be 
used to obtain the distribution of c. The formulae we have obtained agree with 
those for the expected value and variance of c given by Stevens. 


ON METHODS OF SOLVING NORMAL EQUATIONS 

Bt Paul G. Hoel 
University of Califomuif Los Angeles 

There seems to be considerable disagreement concerning what is the most 
satisfactory method of solving a set of normal equations. Since such informa- 
tion as errors of estimate and significance of results is usually desired in addition 
to the solution, in its broader aspects the problem is one of deciding what is the 
most satisfactory method of calculating the inverse of a symmetric matrix. 

For equations with several unknowns some compact systematic method of 


* W. L. Stbvsns, Annals of Eugenics^ loo. oit. 
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calculation is necessaiy to eliminate much of the labor involved in thil onlinaty 
method of calculating the inverse from its definition. Among the more common 
of such systematic methods are those associated with the names of Ohio,* Gauss/ 
Doolittle/ and Aitken.* In addition, A. A. Albert* recently called attention 
to a method implicit in elementary matrix theory. There are also various 
iterative schemes, and schemes which are but slight variations of the above 
methods. In this note only the methods associated with the above names will 
be considered, and for convenience they will be labeled with those names, regard* 
less of who should be given credit for them. 

The purpose of this note is to show that when the calculation of the inverse is 
systematized, all of the above methods are fundamentally equivalent and merely 
involve a different arrangement of work. Consequently, any advantage in calcu- 
lating time for any particular method will arise through such features as a 
simpler technique or less copying, rather than through fewer multiplications and 
divisions. 

By the method of Ohio is meant the evaluation of determinants by the pivotal 
method of reduction. Since all of the methods mentioned above use pivotal 
reduction, the method of Ohio will not be treated as a distinct method. Fur- 
thermore, since Gauss’ method is incorporated in that of Aitken, it will be neces- 
sary to consider only the methods of Aitken, Doolittle, and Albert as distinct. 

First consider the method of Albert, which is based on the following matrix 
properties. Let the matrix A be subjected to a sequence of row transformations 
leading to the matrix A'. Then, writing A = lA, it follows from a theorem in 
matrix theory that A' = I'A, and consequently that A'A“* = I'. If row trans- 
formations are chosen which make A' = I, then A~* = I'. This states 
that if the same row transformations are applied to the identity matrix as were 
used to reduce A to the identity matrix, then the resulting matrix will be the 
desired inverse. The customary manner of reducing A to I b to work for zeros 
in columns as follows: 




flu J 


Oin 

an 

ai2 • • • Oil 

Oil 

On 

0*1 

• 

ft 

ft 

022 • • • Osi 

0 

(n n 

1 02* — Oi2 j • • • 

\ Oil/ 

• 

ft 

( 021^ 

( Ofn — Oin — 
\ On> 

ft 

ft 

Onl 

On2 • • • On 

0 

ft 

^On* ““ Oi2 • • • 

ft 

(n n 

1 Onn Oin — 

\ On, 


* See, for example, Whittaker and Robinson, The Calculus of Observations, p. 71 and p. 
234. 

' See, for example, Croxton and Cowden, Applied Oeneral Statistics, 1939, p. 716. 

*Roy. Soe. Edin. Proc., Vol. 67 (1936-37), p. 172. 

« Am. Math. Monthly, Vol. 48, No. 3 (1941), p. 198. 
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where new letters are introduced for new elements after each reduction. After 
zeros are obtained below the main diagonal, zeros are obtained above the 
diagonal by starting with the last column. If now these operations are per- 
formed in the same order on I, the result will be A“\ 

Next consider the method of Aitken, which is based on the evaluation of a 
bordered determinant, namely, 

fljl • • • CLlf • • • Oln 0 
da • • • dij • • • Ojn 1 I 

cofactor of cin . 

I 

dnX • • • dti] * * * dnn 0 
0 ... -1 ... 0 0 

To obtain A~‘ it is merely necessary to evaluate determinants of this type and 
divide them by | A {. Aitken ’s method evaluates all such determinants simulta- 
neously, using Ohio's reduction technique in much the same manner as illustrated 
above with Albert’s method. Thus, 


Oil • • • Oin 

1 0 ... 0 

0,21 ^22 • • • 

0 1 ... 0 

• • • 

0,n2 • • • dnn 

0 0 ... 1 

-1 0 ... 0 

0 0 ... 0 

0 -1 ... 0 

• • • 

0 0 ... 0 

• • • 

• • • 

0 0 ... -1 

« • • 

0 0 ... 0 
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0 -1 ... 0 0 0 ... 0 


-1 0 0 ... 0 



du 


an 


din 
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..0 


Oil 


an 


dn 

dn 
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« • • 

bin 

Oil 
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■ 0 





bn 

ail 622 
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• • • (btn 

1 bn 
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/ \an bti Oil/ 
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• • • ^bfin 
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bfnr;^ 
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bin 
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% 
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_ an bn\ 
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\aii 

an 622/ 

\aii 
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1 • • 

b 2 n 

O12 

1 

• •0 


622 


bn 

011622 

bn 

0 

0 


0 

f • « 

-1 

0 

0 . 

• •0 


When zeros are obtained below the main diagonal to the left of the vertical 
dividing line, the matrix in the lower right section will be A~^. This follows from 
the fact that the elements of this matrix will be the evaluations of bordered 
determinants, like tliose of the previous paragraph, divided by anbn “ * | A |. 

It will be observed that the operations on A in Albert’s method which produce 
zeros below the main diagonal are the same as those which, occur above the hori' 
zontal dividing line in Aitken’s method. This set of operations is performed 
simultaneously on I, since the upper right section of Aitken’s scheme is I. Fur- 
thennore, obtaining a zero for an element below the horizontal line and to the 
left of the vertical line, is equivalent to obtaining a zero for the element corre- 
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spending to same row and column in the section above the horizontali pro* 
vided the preceding columns contain zeros above the diagonal. But obtaining 
zeros above the main diagonal of A constitutes the second set of operations in 
Albertis method to obtain A' = I. Thus, the operations in Aitken’s method 
which produce zeros in a given column for elements above the horizontal line 
are merely the first set of operations in Albert’s method, while those which 
produce zeros below the horizontal line are the second set of operations in reverse 
order. Since, in Aitken’s scheme, the first set of operations is performed on I 
in the upper right section and the results are transferred a row at a time to the 
lower right section, where they are in turn operated upon by the second set of 
operations, this lower right section is merely I operated upon by the entire set 
of operations of Albert’s method. Consequently, Aitken’s and Albert’s methods 
are the same except for the order in which operations are performed and dififer* 
ences arising therefrom. Since Aitken’s method performs these operations more 
compactly, it is to be preferred to that of Albert. 

Next consider the method of Doolittle, which is described by following 
the instructions given in the first column in the table shown on page 348. 
The forward solution is completed after n such sectional operations. For a 
given k column, the backward solution is obtained as usual by substitution in 
the last row of each section taken in reverse order. 

If all summations in each section are performed in pairs and the sums recorded 
each time, rather than being performed in one operation, the forward solution 
of the Doolittle method will be found to be a rearrangement of the work occurring 
above the horizontal line in Aitken’s method. Thus the first lines of each 
section^ give the matrix above the horizontal line in Aitken’s scheme. Then, 
qxcept for signs, /' and the sums of the first two lines of the remaining sections 
give the result of Aitken’s first sequence of operations above the horizontal. 
Then, except for signs, IF and the sums of the first three lines of the remaining 
sections give the result of Aitken’s second sequence of operations above the 
horizontal, etc. 

The back solution involves precisely the same operations as those making up 
the second set of Albert’s sequence of operations to obtain zeros above the main 
diagonal. Since these were shown to be a rearrangement of operations in 
Aitken’s method, it follows that the methods of Aitken and Doolittle are the 
same except for the order of operations and differences arising therefrom. Hence 
all three methods are basically the same when systematized for a calculating 
machine. 

Because of this equivalence, the number of necessary multiplications and 
divisions will be the same for all three methods, and will be found to be 
in*(n + 1), Since Aitken’s method is to be preferred to that of Albert, it will 
suffice to compare the methods of Aitken and Doolittle for calculating con- 
venience. 

The Doolittle method possesses several distinct advantages. First, its multi- 
plications occur a row at a time with one of the factors constant for that row; 
consequently the keyboard remains unchanged for a given row of operations. 
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Aitken’s method, however, consists of calculating successive cross products' 
which requires clearing of the keyboard after each such operation. Secondly' 
there are fewer additions in the Doolittle method. It sums t: quantities at a 
in section i, while Aitken’s cross products always involve the sum of two 
quantities. Because of the necessity of calculating the complements of negative 
HiimH, this difference becomes important when the number of variables is large. 
A third feature in favor of the Doolittle method is the ease of performing the 
without previous experience. It may be easier to understand how 
to calculate cross products, but actually the calculations of the Doolittle method 
are oqnior to perform. Aitken’s method requires some experience with it, if one 
is to avoid repeating certain calculations which would result from calcuhiting all 
cross products mechatiically. The comparative amount of copying in the two 
methods depends upon the number of variables involved. 

From the above considerations, it may be concluded that the Doolittle method 
is to be preferred among those considered in this paper for solving a set of normal 
equations or calculating the inverse of a symmetric matrix. However, if a 
ain gla technique is desued which can be used for nons}mmetrioal 

equations as well, then the method of Aitken is to be preferred. 
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CONDITIONS THAT THB ROOTS OF A POLYNOMIAL BB LESS THAN 
UNITY IN ABSOLUTE VALUE 

Bt Paul A. Samuilson 

Mtmachuaetta InatitiUe of TetUmdogy 

\ 1. Introduction. In econometric business cycle analysis, probability the- 
ory, and numerical mathematical computation the problem of convergence of 
repeated iterations arises. The solution of the difference equations defining 
such a process can in a wide variety of cases be shown to be stable in the sense of 
converging to a limit if a certain associated polynomial 

(1) /(*) “ Po®“ + + • • • + Pii * 0, 

has roots whose moduli are all less than unity. 

Thus, for “timeless” linear difference equation ssnstcms of the most general 
type, convertible into normal form, 

(2) Qi{t + 1) = X) (t * 1> • • • » «)» 

/-I 

the pol 3 rnomial is the characteristic or determinantal equation, 

(3) /(*) = I a./ - xSiil = 0. 

which when expanded out is of the form (1). The roots of this equation, when 
multiplied by suitable polynomials in t, give the exact solution of the problem 
in the form 

( 4 ) QiO^^ZgM, 

where m is the number of distinct roots, and the gf's are polynomials of degree 
one less than the multiplicity of the respective root. If complex roots occur, 
they do so in conjugate pairs and can be combined to form damped, undamped, 
or anti*-damped harmonic terms. All terms go to zero as t approaches infinity 
if, and only if, the absolute value of each x is less than unity. 

For non-linear systems the exact solution does not take this form, but in the 
neighborhood of an equilibrium point the roots of an associated polynomial, 
except in singular cases, do determine the stability of the system. 

As far as the writer is aware, there does not appear in the literature an account 
of necessary and sufficient conditions for the roots of a polynomial to be less than 
unity in absolute value. This is in contrast to a related problem which arises 
in connection with the investigation of stability of dynamical systems defined by 
differential equations. These have associated with them a polynomial whose 
roots provide solutions in the form 

( 6 ) 
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or for non-linear systems infinite power series in such terxns. ft is i^uiredi. 
therefore^ to determine complete conditions under which the real parte of all 
roots must be negative. 

This problem has been solved by Routh^ in a manner which leaves little to be 
desired, Determinantal expression of his conditions in a slightly modified form 
was made by Hurwitz’ who apparently was unaware of Routh^s work, and by 
Frazer and Duncan’ who were unaware of the Hurwitz results. A brief outline 
of Routh's mode of attack will prove instructive in dealing with the problem 
at hand. 

2. Routhian analysis of sign of real parts of roots. Routh realized 
that the condition that all coefficients be positive — ^the leading coefficient having 
been made so — ^was necessary, but not sufficient unless all the roots were real. 
But a ^^erived’' equation of degree n(n — l)/2 whose roots equal the sums of the 
roots of the original equation taken two at a time has real roots which are simple 
sums of the real parts of those of the original equation. In consequence, it is 
necessary and sufficient that the coefficients of the original and the ‘‘derived'^ 
equation all be positive. 

Thus, valid necessary and sufficient conditions are presented. However, they 
are disadvantageous from two points of view. First, they are not all independ- 
ent, being n(n + l)/2 conditions in number, whereas only n are necessary. Sec- 
ondly, despite several ingenious methods devised by Routh, it is not easy to 
compute them in the general case. 

Recognizing these difficulties, he therefore began anew from an entirely 
different angle. Utilizing a theorem of Cauchy concerning the relationship 
between the behavior of a polynomial on a closed contour in the complex domain 
and the number of roots witliiri that closed curve, he derived necessary and suffi- 
cient conditions, which may be written in the slightly more convenient deter- 
minantal form of Hurwitz and Frazer and Duncan as follows: 

Pi Pm 

To = po > 0, Ti = Pi > 0, Tz > 0, 

Po Pt 

I pi Ps • • • P*t -1 

(6) Pi P» P 5 I 

Ti = Po Ps P4 > 0, 

0 Pi Pi 


I Po P2 • • • p*#~2 

... T, = ® Pi • • • Pi^-» > 0 , 

0 po 


0 0 

^ E. J. Routh, A Treatise on the Stalnlity of a Given State of Motion, (London, 1877), 
Chaps. 2 and 3; Advanced Rigid Dynamics, 6th ed., London, 1906, Chap. 6. 

* Hurwitz, Math, Ann,, Vol. 46 (1895), p. 621. 

« R. A. Frazer and W. J. Duncan, Royal Soc, Proc., Series A, Vol. 124 (1929), p. 642. 
Also R. A. Frazer, W. J. Duncan, and A, R. Collar, Elementary Matrices, Cambridge Uni- 
versity Press, 1938, pp. 161-166. 
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The law of formation of these determinants is obvious. In the first row the 
odd p’s starting with the first are listed. Within each column the p’s diminish 
one unit at a time. Any p with negative subscript derived by this formula is 
treated as zero^ and all p’s of subscript higher than the degree of the equation 
are set equal to zero. With this convention, for po made positive, complete and 
independent necessary conditions are that all principal minors of formed by 
deleting successive^ the last row and column must be positive. These condi- 
tions are n in number and are independent. 


3. Complete, independent, necessary and sufficient conditions. Corre- 
sponding to Routh’s first attack on the problem, we might consider an equation 
of degree n(n — l)/2 whose roots equal the proditcts two at a time of the original 
equation’s. If this equation and the original equation have real roots less than 
unity in absolute value, our problem is solved. This is guaranteed if, and only 
if, two further transformed equations with roots equal to the squares minus unity 
of the roots of the original and derived equations respectively all have positive 
coefficients. These conditions are necessary and sufficient, but not independent, 
and cannot be easily computed in the general case. Therefore, I follow Routh’s 
example and approach the problem from a different point of view. 

When the roots of f{x) = 0 are plotted in the complex plane, they must all lie 
within the unit circle if their absolute values are to be less than unity, and con- 
versely. We might therefore attempt to apply Cauchy’s theorem. However, 
it is not necessary to do so. Routh has shown what the conditions are that there 
be no roots in the right-hand half-plane. Can we find a complex transformation 
of variables which carries the unit circle into the left-hand half-plane? 

The answer is in the affirmative. The linear complex transformation 


(7) 


X 


Z + 1 

z - V 


z 


X + 1 
X - 1 


will accomplish this. But after substituting for x its value in terms of z, we 
cease to have a polynomial but rather a rational function of z as follows: 


( 8 ) 


fix) 



Z pi(z + ir-\z - 1)‘ 


We need only consider the polynomial in the numerator, i.e., 


(9) 


viz) ^t.riz”-* ~0. 
0 


In order that the roots of the original equation be less than unity, in absolute value, 
it is necessary and sufficient that the real parts of the roots of equation (9) be negative. 
Once we determine the coefficients (ir{) in terms of the original p’s, we can easily 
apply Routh’s theorems. This }delds n + 1 necessary and sufficient conditions, 
(dl of which are independent. 
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Eaqianding the numerator of the right-hand side of <8) and eofleetkigtteiia>^ 
the following explicit .formulas for the t’s are directly obtained: 

(10) 1 ir<=>.^py ^ iC», i ; 

A«irO • ^ j , 

where 

/y - 

* “ (v-w)lw!’ 

and 

j) = 1^6 smaller of i and j. 

For fourth and higher degree equations literal substitution, while always 
possible, results in complicated expressions. It is prefeiublc, therefore, to com- 
pute the it’s numerically and then apply the conditions of (6) directly. 

Other necessary conditions can be easily derived, but they will be dependent 
upon these. Thus, each ir must be positive; but this is not, by itself, sufficient. 
Or, adding ito and ir» we find 

(11) ITo + ITn — Po + P 2 + P4 + • • •• > 0, 

i.e., the sum of the even p’s must be positive. Similarly, still other linear sums 
of other it’s will result in cancellation of certain of the p’s. Except on special 
occasions there is probably no labor saved by utilizing conditions derived in 
this way. 

One obvious but useful necessary condition will be stated without proof. 
If one forms polynomials from subsets of the coefficients of a given “stable^’ 
polynomial formed by arbitrary “cuts” which leave adjacent coefficients in 
unchanged order and introduce no gaps witliin each set, then the resulting poly- 
nomials will all be stable. 

Special sufficiency conditions also can be developed. Carmichael^ presents 
pertain inequalities between the absolute values of the largest root and the coeffi- 
cients of the original equation. For special problems these may be fruitfully 
applied. 

4. Example. In conclusion I apply the conditions derived here to a well- 
known numerical equation determined statistically by Tinbergen* in the analysis 
of economic fluctuations. It is a fourth order difference equation with constant 
coefficients, , 

(12) Zt - .398Zt.i + .220Zm - .013Z^^ - .027Z^ » 0 

* R. D. Carmichael, jlmer. Math. Soc. BvU., Vol. 24 (1018), pp. 286-200. 

* J. Tinbergen, Butinut Cydet in the United Statee, 1019-lOSt, League! of-Nhtione, i030, 
p. 140. 



364 


EOBBRT D. GORDON 


with the associated indicial equation 

(13) fix) - - .398** + .220** - .013* - .027 - 0. 

lia roots have been computed and are known to leas than unity in absolute 
value. This may be verified by computing 


(14) 


To * 

0.782 > 0 

Ti « 

3.338 > 0 

TJ * 

5.398 > 0 

Ts 

4.878 > 0 

T4 * 

1.604 > 0 


14.204 > 0 


43.177 > 0 


To compute the same results by cross-multiplication the work is arranged as 
follows: 


(15) 


To 

V* 

*•4 

.782 

5.398 

1.604 

Ti 

v» 


3.338, 

4.878 


TiTj ITqTz 

irsa"! — 0 


14.204 

7.824 



vi(irjiri — iToTs) — iriir»ir4 

43.177 


It may be remarked that the presence of a negative coeflScient anywhere in 
the table is an immediate indication of instability, and that there is no necessity 
to continue the computation until a negative sign appears in a leading coefficient. 
This fact often saves much labor. 


VALUES OP MILLS* RATIO OF AREA TO BOUNDING ORDINATE AND OF 
THE NORMAL PROBABILITY INTEGRAL FOR LARGE VALUES 
OF THE ARGUMENT 

By Robert D. Gordon 
Scrippa Institution of Oceanography 

A pair of simple inequalities is proved which constitute upper and lower 
bounds for the ratio R, *, valid for * > 0. The writer has failed to encounter 
these inequalities in the literature, hence it seems worthwhile to present .them 
for whatever value they may have. 


* J. P. Mills, "Table of ratio: area to bounding ordinate, for any portion of the normal 

curve.” BUmttrika Vol. 18 (1926) pp. 305-400. Also Pearson’s tables, Part II, Table III, 



mills’ haxio 


The function A. is defined by 


A. = e*‘« f‘c- 

Jm 


The following relations between A = A, and its derivatives are easily eiitablished 
by direct differentiations and substitutions: 

(2) s ■*«-!. 

( 3 ) = + 

oar* dx x dx x 


Also by ordinary rules 


ic* +“1/ dx* X* + 1 ■ 


A, > 0, 


lim xA* = 1. 


1®. Suppose that at any point Xi > 0, X\R > 1. Then by (2) dA/dx > 0, 
and Rx would continue to increase with increasing x : still more, xAs would con> 
tinue to increase, hence we should have xR, > 1 for x ^ xj , which contoadicts 

(6) . Therefore we find xA* ^ 1 for x > 0, and 

(7) A. g i, 

which establishes the required upper inequality. 

2®. Suppose that at any point x^ > 0, dtR/dx^ < 0. Then by (4) t^R/dx* ■■ 
(d/dx)(cl^R/dx^) < 0 at this point. Since these derivatives are continuous this 
implies that for all x > Xi , (fR/dx^ < [d*A/dx*],«,, < 0. Then we get tiie 
inequalities, for x > xs 

«<«.,+ (*- I.) - r,)’ 

where ( ]» indicates evaluation at x = xs . Since {d*A/dx*]* < 0, this implies 
that for sufficiently large x, Rx < 0, which contradicts (5). It follows then 
that (3) is positive, and substitution of (2) gives 


x« + r 


( 8 ) 
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We combine (7) and (8) in the double inequality: 

( 9 ) 

Ihis gives fm: the probability integral the corresponding. i|nequality 


x*+l 




It can easily be shown (for x > 0) that equalities in (9)and (10) are impossible. 



DISTRIBUTION OF THE RATIO OP THE MEAN SQUARE SUCCESSIVE 
DIFFERENCE TO THE VARIANCE 


By John von Neumann 


Institute for Advanced Shtdy^ 

1. Introduction. I^et xi , • • • , x, be variables representing n successive ob- 
servations in a population which obeys a distribution law 




“ 


i.e. which is normal, with the mean f and the standard deviation a. For the 
sample we define as usual the mean, 


1 ** 

X = - . 


n nmel 

the variance, 

8* = j Z (*M - 

71 ftmml 

and also the mean square successive difference 

>• 

M-1 


i; (x,+, - xj*. 

n — 1 11^1 


The reasons for the study of the distribution of the mean square successive 
difference 6^, in itself as well as in its relationship to the variance 8 ^, have been 
set forth in a previous publication®, to which the reader is referred. The distribu- 
tion of 5 ®, and in particular its moments, were also studied there. The present 
paper is devoted to the investigation of the ratio 

5 ® 


A comparison of the observed value of ri with that distribution is particularly 
suited as a basis of the judgment whether the observations Xi , • • • ^ Xn are 
independent or whether a trend exists. (Cf. sections 1 and 2 , loc. cit.*) 

The moments of 17 have already been determined by J. D. Williams by a 


^ Also Scientific Advisory Committee of the Ballistic Research Laboratory, Aberdeen 
Proving Ground. 

• John von Neumann, R. H. Kent, H. R. Beilinson, B. I. Hart, ^‘The mean square suc- 
cessive difference,’’ AnnaU of Math, Stat.f Vol. 12 (1041), pp. 153-162. 
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different method.* Williams’ results have been diecked by W. J. Dixon at the 
suggestion of S. S. Wilks, whose stimulating interest has been largely responsible 

for the undertaking of the series of papers on ^ and ^ . The present rather 

o 

exhaustive discussion, however, brings out several other essential characteristics 
of this statistic, and provides the key to some very effective computational 
methods. It is further hoped that the reader will find that the mathematical 
methods used and the generalizations indicated have an interest of their own. 

From the latter point of view the final results of sections 5 and 7, concerning 
the distribution of values of quadratic and of Hermitian forms, may deserve 
special attention. 

2. Diagonalization of the quadratic forms and replacement by a spherical 
mean. Since 8* and s* are unchanged when we replace each by a:„ — we 
may assume ( — 0. Then the distribution law of x is 

n 

dx, and that ot xi, , Xn is JJ dx^ , 

M-l 


i.e. 


- 2 **/2ir» 

c”c ***' dxi • • • dXn- 

Any linear orthogonal transformation of the , • • • , Xn leaves 2 xj and 

M-i 

dxi • • • dxn unchanged, hence the above distribution law will likewise be left 
unchanged. Thus, we may subject the two quadratic forms / to any simul- 
taneous linear, orthogonal transformation. 

Consider one carrying Xi , • • • , Xn into, say x( , • • • , xl , which brings the 

n 

quadratic form (n — 1)8* into the diagonal form, say 2 A*- Such a trans- 
formation does not affect the characteristic values of the quadratic forms*, and 

n 

these characteristic values are obviously , • • • , An in the case of 2 A^^^. 

Consequently Ai , • • • , An are the characteristic values of the original quadratic 
form (n — 1)4^ We shall determine them as such in the next’ section. 

Clearly we always have (n — 1)5* ^ 0, hence all A„ ^ 0. Some may 


* J. D. Williams, ^'Moments of the ratio of the mean square successive difference to the 
mean square difference in samples from a normal universe,” Anmls of Math, Stat,^ Vol. 12 

(1941), pp. 239-241. Cf. also L. C. Young, **On randomness in orded sequences,” Annals 
of Math. Stat., Vol. 12 (1941), pp. 29^-300. 

* Fot the properties of matrices and quadratic forms cf. e.g.: J. H. M. Wedderbum, 
Lectures on Matrices, Amer, Math. Soc. Colloquium PvblieaJtions^ Vol. 17, New York, 1934. 
In the present context cf. mainly Chapters II and VI. 



toafXBitfra 

equal 0 say A; («■ 0 , 1 , • • • , ^) of them^ wMeh we ean arrtmge to be , 

(n — l)i* * 0 is thus equivalent to = • • • = * 0, i.e. to n -sr t inde* 

pendent conditions. On the other hand this amounts obviously to xi » ■ 

Xn , and these are n — 1 independent conditions. So A; =s i and consequent^ 
iii , • • • , > 0 , dt« « 0. And our linear orthogonal transformation must 

carry the x-vectors with *1 = ••*=**» into the * -vectors with *( 


= Xn-l 


0 . Among the former, 


—7= , • • • , —7= has the length 1 ; among 
Vn V« 


the latter only 0 , • • • , 0 , ± 1 have. Hence these correspond to each other; 
Now the scalar (inner) product of two vectors is an orthogonal invariant, that 

of a vector ari , • • • , *« with , • • • , -4=. is V^, that of a vector x'l , •••,«, 

Vn V« 

with 0, • • • , 0, ±1 is ±Xn , hence 

= ±*n. 

n 

Put — X + Then clearly 2 % =* 0 - Hence 

M-l 

X} xj = nf* + 23 ^2 = + ns®. 


M-l 




Owing to the orthogonality, the left-hand side is equal to 2 therefore 


M-l 


n-l 

2 fi 

na ^ 

Remembering that ^In = 0 , we also have 

(n - 1)5* = E A,x'*- 

|l— 1 

Consequently 

n~l ^ 

-2 2 AbXu* 

” M-1 

’ 8* n - i ,j ■ 

IIMll 

The distribution law is, as we know, the same in xi , 
namely 

;*/*»» 

C 6 • * * tui/n < 

Thus Xi , • • • , xi are independent. ri depends on xj , • • • , xi_i only, hence we 
may disregard xi altogether, and use the distribution law of the Xi , • • • , xi-i , 


, Xn as in xi , • • • , Xn > 


5 

e " * dxj • • • dxi 
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With respect to a:( , • • • , xl-i we may now state that the «(,•••, a:i_i dis- 
tribution of ri can be obtained by determining first the distribution of q over 
every spherical surface 



and then averaging these distributions with the weights ^(r) dr, where ^(r) dr 
is the probability of the spherical shell from r to r -f- dr with respect to our 
original x[, ••• , x„-\ distribution law. (It happens to be dr, but 
this is immaterial.) 

Since the x[ , • • • , x'^-i distribution law is obviously spherically symmetric 
in these variables, the first-mentioned distributions over the spherical surfaces 
are readily obtained by assigning each piece of the surfaces in question its own 
relative, n — 2-dimensional area as weight. 

Since ij is a homogeneous function of , • • • ,Xn-ioi order zero, these spherical 
surface distributions of 17 are the same for all r. Consequently we can replace 
all these r by, say r = 1, and the subsequent averaging over the r may be omitted 
altogether. 

Finally, since we restrict ourselves to r = 1, i.e. to the spherical surface 

2 = 1 , 

d-i 

the denominator of 17 may be omitted and we have 


V 


n — 1 


We sum up, writing again asi , • • • , Xn-i for xi , • • ■ , x*_i , then the desired 
distribution of 17 is that of 


•n = 



M-l 


where the point Xi , • • • , x„_i is uniformly distributed over the spherical surface 


»— I 


1 . 


Here Ai, ••• , An-i are all positive, and together with 0 they are the charac- 
teristic values of the quadratic form 

(n - 1)6* = £ (x„+i - xj* 


n— I n— 1 

+ 2 S xj + asi - 2 £ x„x„+i . 

ll—l 
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871 


3. The chanctoiitic ralttes ; first orientation concerning 9 . Wi have 
shown that there exist (counting multiplicities) precisely n — 1 positive roots 
A of the characteristic equation 


^ - 1 1 

1 A -2 1 

1 A -2 1 


1 


1 

A -2 

1 


1 

A -2 

1 


1 

A -1 


« 0 


(the empty places are filled with zeros), and that these roots are the 

Ai f t ^ ♦'•‘1 * 

Such an A is characterized by the possibility of solving the equations 
(A — ])a:i + a;j = 0, *1 + (A — 2)x* + ** = 0, x* + (A — 2)x» + X4 = 0, 

* * * * , ^n— 2 “1“ (A “ 2)Xn~l ^ — 0, Xn — 1 ^ (A l)Xn ® 0, 

in Xi , • • • , Xn not all equal to zero. Put 


Xo = Xi , X«+1 “ Xk , 


and 


A = 2 — 2 cos a, 

then these equations become 


x„_i + x^+i = 2 cos a-x„ for #* == 1» 2, • • • , n — 1, n. 


The last equation is satisfied by 

of 

= 2 cos (/u — i)a for M = 0, 1, 2, • • • , n — 1, n, n + 1. 

Now Xo = Xi is automatically fulfilled, while Xn+i == Xn demands cos (n + i)a == 
cos (n — i)a. This is certainly the case when (n + i)a = 2 kr — (n — J)a 

kw 

(k any integer), i.e. a == — . For no A; = 1, • • • , n — 1 are , • • • , all equal 

n 


to zero ^indeed xi = 2 cos ^ ^ > 

They are 


hence these k give A’s of the desired kind. 


A = 2 — 2cos — = 4 sin* ^ (k * 1, • • . , n — 1), 
n 2n 

and so they are all positive and different from each other. Their number is 
n — 1. Hence they are precisely Ai , • • • , A«_i . 
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So we have shown 


4 sin’ ^ (m - 1, • • • , n - 1). 


We can now reformulate the final result of the preceding section. 
Let us set 


n — 1 


(1 - «). 


n—l 

E IJLir 2 
COS— 

M-l ^ 

where the point a:i , • • • , ajn-i is uniformly distributed over the spherical surface 

n~l 

Z *2 = 1- 

Ai-l 

Replacement of by Xn^n carries c into — €. Therefore the distribution of 
€ is symmetric around 0. Hence the mean of e is 0. The maximum of c’s 

distribution is clearly cos ~ , its minimum is cos = —cos - . We state 

n n n 

these facts, together with their equivalents for ri. 

€ (riY^ distribution is symmetric around its mean, which is 0 

maximum of e (ij)’s distribution is cos - ( - — - 1 + cos - = cos* ~ ) , 

n — 1 L nj n — 1 2n/ 

. . IT / 2n r, jr"| 4n . 2 TT \ 

its mimmum is —cos - I 1 — cos - = - — ~ sin ^ 1 . 

n \n — 1 L wj n—l 2n/ 

Thus it will be easier to obtain information concerning ij by considering the 

distribution of e, since all odd moments of * are zero, etc. The investigation of 

e instead of rj was first suggested by B. I. Hart, who also found, that the first 

four odd moments of c vanish. R. H. Kent and B. I. Hart also determined the 

minima and maxima of these distributions for certain small values of n. 


4. Direct computation of the moments. We shall investigate the distribution 
law of a quantity 

7 = 

where the point ati , • • • , is equidistributed over the spherical surface 


Our above e obtains by putting m = n — 1 and = cos ■ 



DisnuBtmoK or a leumo 


m 

We denote the mean of any function 

f(xi, ••• , Xm) 

over the above-mentioned spherical surface (the xt, • •• , x„ being equidistrib- 
uted over it) by 


/(®1> • • ‘ » *m)- 

Our primary objective is to determine the moments of this distribution 


M. 



(p = 0, 1, 2, ...). 


Let us write for the (m — 1-dimensional) area of the above-mentioned 
spherical surface (of the unit sphere in m-dimensional Euclidean space). 

Now we form the function 


/(«) = [ ■■■ f 

•C_oo *f—eo 


(This integral, as well as all others which we are going to derive from it, is ob- 
viously convergent, as long as 2 is suflSciently small. More precisely this is true 
when 


l2|.Max(lBx|,...,lJ5.|) g 1. 

We shall use them only in the neighborhood of 2 == 0.) Now clearly 

/(2)1 ~ * * ■ /* dxi • • • dXi 

J.-O (JLae J -00 V-1 / 

- ('■■■ f 

J-00 J-te \jt^l / 

- i Z» Mp c- dM 



• Introduce the new integration variable u » r*. 
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On the other hand 

/(*) “ /* ' ’ ' f 

J-OO j— 00 

= n • 

llawl J-OO 

= n id - 5^2)"‘-2 f e-“u-*du 
Jo 

= n id - -B.2)-‘-2r(i) 

M-1 

= r(i)"^d)-*, 

where 


w) = n d - B,z). 

#1—1 


Thus 


}i.M,r(p + 5)-rQ'{^.?W-‘ 

For p = 0 this becomes, since Mq = 1, ^(0) — 1, 


s-O 


1 

2 



Dividing the former equation by the latter gives, since 



In order to make a practical use of the above formula, we compute 
In {W‘) = -i Z In (1 - 

M-1 



® Introduce the new integration variable w (1 — 
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m 


Write 


then 


and so 


ai 



*= 1 + Pit + A** + A** + • • • , 


Clearly 


M, 


1 - 2 - 


m 

2 




= «! , 

^2 = aj + , 

ft = as + Ciia2 + iai , 

ft = a4 + io£2 4* otia$ + i(Xia2 + . 

In our application (cf. above) 


•Sm+l— M • 

This has the consequence that 

«! = OtS = = • • • =0. 

Thus the z functions we compute contain only even powers of z and consequently 


ft = ft = ft - • • • * 0, 
Ml ^ Mz ^ Mi ^ . . . = 0, 


ft = 052 , 

ft « 04 + i02 , 

ft = 08 + 0204 + foi , 

ft - 08 + io* + 0206 + i0204 + 1^2 • 


and 
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As mentionisd before, we actually have w «■ n — 1 and 
sequently 


cos — . Cbn- 
n 


ai 


<(U-I)M<r/n t 


I n-1 ( >{ < n-1 

^ E E 

2‘+n^it&w 

— E E 

2‘+Hh\k/ h 

2‘+Hh\kJ\^ 


+ e 


r^rtn^ j I 




JUrnik^DIn 


-}• 


The inner sum has obviously these values 

= n if fc — JZ is divisible by n 
= 0 otherwise. 


n~l 

M-0 


Also 


Consequently 



2*. 


= 


n f l\ ^ ^ 

W 2V 


where extends over those A; = 0, • • • , Z, for which A; — JZ is divisible by n. 

k 

Let us now determine the k occurring in the following sum (as above, k — \l 
is divisible by n) k — ilia clearly one of them. All others are of the form 

h 

k — dz hn, h = 1, 2, • • • . The term contributed is the same for + and 
for — , since 

^ )-( ^ 

+ hnj Viz — hn 

So we have 





for I odd, 
for I even. 


* Ab pointed out above, we need to consider only the even Z. 



Diai;;Ristm<w or A lumo 877 

The number of tenoswludi the sum ^ contributes depends on the oompMutive 
sizes of I and n. The number is clearly 

0 for JZ < n, 

1 for n S §Z < 2n, 

2 for 2n S hi < 3n, 


Explicit formulae follow:* 


oi * 08 *= a* = ar “ «» = 


n — 2 
a,-—, 

3n — 8 
~ 64 ’ 

5n - 16 
“* “ 192 ’ 

35n - 128 


(0 for n = 1), 


(0 for n = 1, 2), 


^0 forn = 1, 2; , n = 3^ 


= ft = ft = ^7 = /S» = • • • = 0, 
„ n — 2 


n* + 2n — 12 


«* + 12n* + 8n - 168 


n* + 28n* + 212n* - 64« - 3696 
98304 


(0 for n = 1), 


(0 for n = 1, 2), 


^0 for-w = 1, 2; 
, ^0 for n = 1, 2; 


, n * 3^ , 


Ml = Mi — Mi = Mj = Mi = • • • = 0 , 

if - 8 n-2 

* (n - l)(n + 1) (n - l)(n + 1) ’ 


36 „ 36 .> 

32768’” “ ®’2048’ ” “ ^ 


(0 for n « 1), 


* The author wishes to express his thanks to Miss B. I. Hart for her kind help in carrying 
out these computations. 
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Af« 


M, 


384 


(n -- l)(n + l)(n + 3)(n + 5) 




3(n^ + 2 n - 12 ) 

(ti “ l)(n + l)(tt 3)(w + 6 ) * 

(0 forn — 1 , 2 ), 


46080 


•iS. 


(n - l)(n + l)(n + 3)(n + 5)(n + 7)(n + 9) 

15(n* + 12n* + 8 to - 168) 

(n — 1) (n + 1) (n + 3) (n + 5) (n + 7) (n + 9) ’ 

(0 for n 


2; tAt» w = 3)* 


10321920 

(n - l)(n + l)(n + 3)(n + 5)(n + 7)(n + 9)(n + ll)(n + 13)'^* 

105(n* + 28n' + 212n* - 64n - 3696) 

(n - l)(n + l)(n + 3)(n + 5)(n + 7)(n + 9)(n + ll)(n + 18) ’ 

(0 for n = 1, 2; w = 3; n = 4). 


We conclude this section by obtaining as 3 rmptotic formulae for the distribu- 
tion of ( when n—*eo. 

In this case our formulae show that all ai (1 even) behave asymptotically like 
constant multiples of n. It also appears from our formulae for the 0i {I even), 
that 


ft = a** -b a polynomial in as, 04 , • • • , aj_s of total order S ii — 1 . 

\2 V I 


Consequently a|‘ is the dominant term in this expression, and so we have 
asymptotically 


a 1 »' 1 /”i 

(ioiW ■ 


From this 



Now the normal distribution 

Cxe'"**'*'* dy, ^Ci 

with the mean 0 and the standard deviation vi has the moments 



mi 


[yeie^*'^dy. 
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This is clearly 0 for i odd, while for I even* 


«+!>... 0*<«+W 


mi ■» ffi cr2 


2»«+«)_i+i, 


e““ 

Ct)- 


du 


vrciri 

For I = 0 this becomes, since »io = 1, 

1 = 2Wir(i). 

Dividing the former equation by the latter gives, since 

tC±S) 

\ 2 / _ 1 3 1-1 

r(i) 2'2”’ 2 ’ 


mi = 1*3 • • • (I — == 


11 


i 

I ®'i 


- JL 

(i0iV2/ • 
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2 **(ii)I 

Comparing the formulae for Mi and for mj shows that Af j ~ wj if = — , 

271 2 

ffi = • So we see: 

For 71 -+ 00 the distribution of c becomes asymptotically normal, with the 
mean 0 and the standard deviation <ri = . (The same result could be ob- 

tained by applying the general theorems of Liapounoff and others.) 


6 . The distribution law, general discussion. We return to the quantity*/, 
defined at the beginning of the preceding section, of which our c is a special case. 
We wish to obtain direct information concerning the distribution law of this y. 
Since a permutation of the is permissible, we* arrange them such that 

(In the special case 7 = «, the i5„ = cos — are given in this arrangement.) 

n 

The distribution of y covers obviously the interval 

^ y ^ Bm . 

And if not Bi = • • • ^ Bm , i.e. if Bi > Bm , which we assume to be the case, 
then we have obviously a continuous distribution law for y in this interval. 
We denote it by u(y) dy. 


' Introduce the new integration variable u 


JL 

2 »f 
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Assume for the moment that Bm > 0. 'Dien the quantity 




is bounded, and we can therefore form its mean value. This is the moment 
of y (cf. the beginning of the preceding section) 

- 7*" » (I 

y~^u(y)dy. 

With any two a > b > 0 ^we shall have ^ * subsequently^ form the 

/ /•/*» \“4m 

' J 


quantity 


dx„ 




-f 

Jb 


r'”-2m r”~^dr = S, 


f 

Jb 


dr 
b r 


= 2„ln^. 


Consider next 
8(o, b) 






= /■•■/ (j if.*;)’*’ ■•■*■- 

•»6 


VtlB. i(a, b). 




Concerning this transformation to polar coordinates and the quantity Xm cf. the first 
part of the preceding section. 

Replace each variable by \/^ . 



siwnoBtmoM or a batio 


881 


On the other hand^. a comparison of t^eir respective integration domiditt malces 
it clear that 

f(S«a, Bib) ^ «(o, 6) ;i t(Bia, BJb). 

Thus 

S„ In ^ ^ In I g 2 » In 1^, 


i.e. 


1 


1 o , , 

1 


r ^ il/— |m ^ 

/ns, j/n^ tas 


Now let r -+ «, then 
0 


M-in = 




M -1 


obtainfi, i.e. 


ry-^^(y)dy 


i/flB, 

f ii-l 


We now drop the assumption Bm > 0 . We consider instead a real number 
z with z < Bn- Replace each by — z. Then the one with fx — m be- 
comes > 0 . And 7 is obviously replaced by 7 ~ 2 ^. Consequently our above 
equation is now valid in the form 


(7 - «)“*" = / iy - z)~^u(,v) dy = 




Let now r be a complex variable. The second term of the above equation is 
a (locally) analytical function of z, except in the (real) interval ^ 2 ^ . 

The third term, too, is a (locally) analytical function of z, except at the (real) 
points Bi , ■ ■ • , Bm . Consequently both are one-valued analytical functions 
of z in the simply connected domain which obtains from the complex z plane by 
exclusion of the (real) half line 

Bm. 


Hence the equation 

( 1 ) / (y - «)"*”«(y) dy 

’’Bm 


1 



(B, - z) 


} 
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which holds for all (real) t < B», remains true for all complex t of the above 
domain.” 

We observe next that u(y) is an analjrtical function y in Bi ^ y ^ Bm , 
whenever 2/ ^ J3i ,•••,£«, . This is easily established by using any multiple 
integral expression for u(j/) which, while hard to evaluate explicitly, puts this 
analsrticity into evidence.” 


»» {y — *)-»» and the factors (J5„ — *)”* of 


a/s 


are those branches of these 


(Bm - «) 


analytical functions which are (real and) > 0 when z is (real and) < . When m is even 

(as it will be, cf . below) the domain of analytic! ty is somewhat more extended, but we need 
not discuss this. 

The computation which follows gives the desired analytic! ty in a simple way, and also 
makes it clear why the analyticity fails at y — Bi , • • • , Bm . 

Consider the y Bi , • • , Bm in Bi ^ 2 / ^ Bm . The probability of 7 ^ y is p{y) « 

«(2/) dy, and we may establish its analyticity instead of that of p'(y) ■■ w(y). 

m 

m m 

Obviously p(y) is equally the probability of ^ B^xl ^ y , if the Xi , • • • , aw are 


/. 




M-1 


equidistributed over a spherical surface X) « r^, with any given r > 0. 

/I— 1 

Our hypotheses concerning y imply Bv > y > Bv+i for a suitable * 1, 
Consider now the expression 


, w — 1. 


f(y) 


j ■■■ j e ... dx„. 


Transforming to polar coordinates, we obtain 


m 


f e-''-Zmp(y)r”-^dr 

Jo 

Sm f e'‘^'r^''^dr-p(y). 

Jo 


(Zm as before.) Hence it suffices to establish the analyticity of f(y). Now on the other 
hand 

f(y) - / ••• / d®. ... daw 

i J J "V dto, . . . dlVrn . 


(We introduced the new variables ■■ V^l B^i — y I a;^.) And this expression is clearly 
analsrtical in y, since > y > Bt^t . 



DISTBIBimOK or A SAHO 


We shidl need ooly the fact that w(y) poeeeeses im continuous duiyatives at 
these places, (m will be assumed to even, cf. below.) Its behavior at 
y » Bi Bm will follow from our subsequent results in all cases where we 
need it. 

In order to determine u(y) from (1), as we now propose to do, it is very con- 
venient to assume that m is even. We therefore make this assumption, and 
shall maintain it throuighout most of what follows. 

Consider a yo Bi , • • • , in Bi S yo S Bm . Then B, > y > B,^n for 
a suitable v = 1, • • • , m — 1. Now put 

* = yo + it (t real and > 0)> 

form (1), take the imaginary parts of both sides, and let t — » 0. 

Consider first the left-hand side of (1). Since »(y) possesses continuous 
derivatives at y = yo , we have 

|m— 1 

«(y) “ £ My - yo)* + e(y)(y - yo)*" 

*-0 

with a bounded e(y). Clearly 

Thus, since u(y) is real, all 6k are real and e{y) is also real. 

Compute now the contribution of each one of the §m -f 1 terms in the above 
expression for <o(y) to the imaginary part of the left-hand side of (1). 

The last term, e(y) • (y — yo)*", gives 

3 f (y - yo - fO"*"e(y)(y - yo)*"dy = 3 / ~ c(y)dy. 

•'o« •’«■» \y ” yo — «/ 

The integrand is uniformly bounded, and so the reality conditions cause the 
entire expression to 0 for < — ► 0. Hence the contribution of this term is zero 
for t —*0. 

Tft 171 

The other — terms correspond to fc = 0, 1, •••, — — 1, the k term being 
2 2 


iy - yo - it)^'^-ek{y - yoY^dy 

-wf 


(y - yo)* 


*« (y - yo - t'O** 

r*i 2 (J) (tO*(y - yo - »<)*"* 


i=lW dy 

'« (3! - yo - » 0 *" 

= 2 3 1(»0* (y - yo - »0*"*"*" dyj . 
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TTie exponteot A: — A — ^ in the integral is alwa 3 ^ ^ — 1^ — 0 — 


w 
2 ' 


“ 1 , 


and it is = — 1 if and only ifA: = ^— 1, A"=0. Consider first a term where 


this is not the case, i.e. where the exponent k — h — ^ < —1. For such a term 

2 

the expression ^ : 

9(t0* 


becomes 

1 


1 I ^ I V 

k A — 2 + 1 


{{y -yo- 


For t —*■ 0 the last factors are boimded and real, and so the entire expression -♦ 0: 
for A = 0 because of the reality conditions, for A > 0 because of (it)’' — *• 0. 

in 

Thus only the term A=- — 1,A = 0 can contribute something else than zero 
for t—*0. 

Now this term is equal to 


{In (y-yo- fOlillSi., 

and for < — » 0 this converges to 

v r 




(?->) 




uiy) 


Wo 


Thus the imaginary part of the entire left-hand side of (1) converges for 
< — > 0 to this expression. 

The right-hand side of (1) is easier to discuss. The imaginary part under 
consideration is now 


9 




2/o fO 


““ 9 II C-®/! yo it) ^ 

M-l 


Considering^* (its y is our + it), this converges for < 0 to 

3 n (5 m - yo)-* n t(yo - 5 m)-* = , L 

lltml / *» 

y n I 5m - yo I 


This evaluation |ln (y — yo ~ *0 IjZSi, w is based on (> 0, and the fact that y moves 
on the real axis from jEU to . It has no connection with^*. 

The square roots of the (real and) > 0 quantities 

“ 1/0 (m ■■ 1, • • • , yo — (m ■> + 1, • • • , m), and I - l/o I 


are taken to be > 0. 



mSTBIBTmOII Q9 A. RATIO 


If V (hence m — v) is even, then this is sero. If v (hence m — «) is odd, then 
this is equal to (-1)*^'"^"” — y_r ^ . Thus (1) beromesthefpllowiog 




Vo I 


equation: 


*• / \ ^ 

_ i\ I I 


if t> is evoo, 
if V is odd. 


Simplifying this, and writing y for yt , and also restating the definition of v gives 

if V is even, 


d*"-* . “ ® 


( 2 ) 


f?-iV 

_ \2 /_ 


1 


i/n 


if V is odd, 


•B„ — ^ 1 

S, > y> Bh- 1 , w “ 1, • • • , m — 1. 


Observe finally, that if we put 


w » n (2/ - Bx 

M-l 

then this product has v factors < 0 (m = 1, • • • , v), while the others are > 0. 
So 

a(y) ^ 0 for . 


and in the latter case 


ni^M-i/l = -a(y). 

It is clear how we may now rewrite (2). 

We are now in a position to determine the behavior of w(y) at j/ » Bi , • • • , £« 

too, since we know how its ^ — 1-th derivative behaves in the immediate 

vicinity of these places. (2) shows that it is singular there, and that the nature 
of the singularity depends on the number of the for which B„ is equal to the 
y in question, i.e. on the multiplicity of this root of our polynomial fi{y). 

In our actual application (to -y = i, cf. the beginning of this section) the 
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are pairwise different, i.e. all root multiplicities of 9((t^) are equal to one. A 
further special case, which has a certain interest of its own, is when the are 
equal two by two, but otherwise different, i.e. all root multiplicities of $((v) are 
equal to two. In the discussion which follows we shall therefore assume that 
one or the other of these two cases occurs. 

In the first case w{y) has on each side of a y = J3,, one of these two 
behaviors: It is identically zero, or it is singular, of the type ' ^ 


it is at any rate integrable. Consequently 


d 


ijm— 2 


'y/\B,-y\ 


Thus 


, a{y) is continuous on each 


dy^~^ 

side of y = jB„ , i.e. for both y = ± 0. Successive integrations give now 

d!^ Tfi 

that all v-r uiy), k = 0, 1, ^ — 2, are continuous for both y — B„ ± 0. 

dy* 2 

In the second case we have Bi — Bt > B^ = Bt > • • • > B«_i = B„. So 

dim-i 

the V with Bv > y > S»+i is necessarily even, and identically zero 

dim-t 

for all of (2). Consequently u(y) is again continuous on each side of 

y — B^ , i.e. for both y = B^ ± 0. Successive integrations show again that all 

u(y), k = 0, 1, n — 2, are continuous for both y — B^ ± 0. 
dir 2 

d* m 

We must therefore discuss only how much the wiy), fc == 0, 1, • • • , ~ — 2, 

dy^ 2 

change from 2 / = — 0 to 2 / = 5^ + 0. 

Let us return to the procedure by which we derived (2) from (1). We put 
again 


^ = ^0 + it 


(t real and > 0) 


and let < — > 00 . But we consider now (1) itself (and not merely its imaginary 
part), and we choose a . 

Consider first the left-hand side of (1), always disregarding terms which stay 

(•Bij+a 


bounded for t —* 0. Then we can replace the integral f of (1) by any f 
with any fixed o > 0, and this is equal to 




/ + J 


Br+0 


We choose this a > 0 so small that no B„ 9 ^ B» lies between B» — a and B, + a. 
d* tn 

I.e. all r-L «(p), A: — 0, 1, • • • , — 2, are continuous from B, — o to 

ay* l 

B, — 0 and also from B, + 0 to B, + o. 
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This being the case, we can evaluate the above sum of two integrals by ^ 1 

21 


successive partial integrations. Thus we get 

[»«-»(?- 2 -fc)l ,* , 

(f-i)' ^ 








d* 




(5-0 


(y-B,- it) ‘|^jCo(y)dy. 




In the first two lines the p = Bv ± a terms are bounded for t — ► 0, therefore 
only the y — B„ =t 0 terms need be kept. Then the first two lines give 




(iz-0 

(5-0' 


(-it) 




dy* 


<»(y) 


y— B*-fO 

f 

pmmBp^ 


up to terms which stay bounded for i—*0. Consider now the third line. We 


d*"-* 


Ci 


know that the y-j — u(y) in its integrand can be majorized by (for 

dy*”* * V I y — jD* I 

a suitable constant cj , cf. our discussion preceding the present one). Thus the 
integral in question is majorized by 

fBp-ha 

/ |y - 15, - tir‘ct|y - fi, l“*dy, 


hence a fortiori by 

f |y - S, - t«l"‘cj|y - B,|“*dy“ 

•^00 

= c»t"*/ I « — t r‘l u|“*dw 

-r 


du 


,>/(«* + i)-i«i 


dv 


>* Introduce the new integration variable u » 


“s/e* + 1 

V — B, 




Introduce the new integration variable i; ■■ V| w | . 
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Since the last integration is obviously finite, the entire expression is 0(r*) for 
<-^0. 

Consequently the left-hand side of (1) is equal to 


" (!->)■ 




Ml 


+ o«-*), 


for < 0. ( For * Bi or J3m the ^ <a{y) eAy == JS* + 0 or Bp — 0, respec- 


tively, must obviously be taken to be zero.^ 


Consider now the right-hand side of (1). 

We first suppose the are pairwise (Merent. The right-hand side in ques- 
tion is — ' ' " ) i-e- 0(r*). 

Secondly let us consider Bi = fi* > 5* = £4 > • • • > — Bm. So we 

may assume i> = 2X ‘ ‘ right-hand side of (1) becomes now 

a rational function, ^ . (The sign is determined by**. ) So in our case 


(Bit - z) 




(-iO"* + 0(1). 


,i.e. x::i f i 

11 (S« - - it) n {B 2 k - Bn) ■ n (Bn - Bn) 

km.1 ib-1 kmOK+l 

Comparing these with our above expression gives therefore (for t — > 0) 




( 1-0 




(-1)<” -’^ 

H (Bik — Bn)‘ h (Bn ~ 5**) 


0 (r^) in the first case, 
(— t'O”* + 0(r*) in the second case. 


In this formula the left-hand side is a polynomial in {—it)~^. Hence the 0(r*) 
terms on the right-hand side must vanish, and otherwise all powers of —it must 
have the same coefiicient on both sides. Consequently 


d-O' 
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must vanish^ except in the second case for the one value of ^ adth 

2 . So, with this one exception, we have 


“ 1+1 + *- -l,i.e.* 


m 

2 


ay" J»-.8,+o \cly" 

And in the exceptional case (second case, v = 2X) 

^ «(y)} - (-1)*”-^ ( I “ 1 ) » x-=l W 




(Bix ^ Btk) 


u ffl 

Thus in the first case all derivatives tj, »(y)> * — 0» 1> * * • > “ 2, are con- 

dyp 2 


tinuous even &t y — Bi , • ' • , 

In the second case the same is true for * = 0, 1, 


m 

’2 


— 3, but the deriva- 


7fl 

tive with k — — 2 behaves differently for y = Bi , • • • , . Indeed, for 

y = Bjx-i = B*x = 1, • • • , this derivative is continuous for both y — 
Bn. ±. 0, but it increases from Bjx — 0 to Bn + 0 by 

1 

¥ 




n {B%k B2x) n "" fiaO 

hml i^X+l 


( 


d!^ 

At y = Bi + 0 and B„ — 0 the j-r «(y) must be thought to continue with the 

«lr 

value zero.^ 

These rules, together with ( 2 ), determine w(y)‘ completely. 


6 . First special case. We consider the first special case, where the B^ are 
pairwise different. We immediately specialize further, to 7 c, i.e. m = n — 1, 


B„ = cos — (m = 1 , • • • , n — 1 ). (Cf. the beginning of the preceding section.) 
n 

Since m must be even, n must be odd. The rules of section 5 determine 

7% 1 

u(y); in particular all derivatives ^ a)(y), * = 0 , 1 , • • • , — ^ 2 , are every- 

where continuous, beginning and ending with zero at y » Bi mid B«-i , 
respectively. 

In the even intervals 


B% y ^ Bit B 4 ^ y ^ Bt , * ‘ , B«-a ^ y ^ B»-4 , 
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the derivative «(y) is *ero, i.e. u(y) is a polynomial of degree 

|(n — 1) — 2. In the odd intervals 

Bi ^ y ^ Bi , Bt ^ y ^ Bt, ” • , Bn~i ^ y ^ -Bn-i > 

we have 

^ ■ (i[n - 1 ] - 1)1 1 

ayUn-v-Mv^ ’T ^-ZWiV) 

(the sign ± is alternating (— (— ... ^ 4 .)^ where 


a(y) 




Another expression for ?l(y) may be found by the following method, 
sin ( n<p) ^ 6 *" ^ — e 

^-0 


Clearly 


sm (p 

is a polynomial of cos of degree n — 1 , with the highest co- 
efficient 2”“\ For (p ^ ^ pi = 1, • » • , n — 1, sin (n^) = 0, sin v? 0, hence 

n 

sin {flip) 


sm ip 


, as a polynomial in cos has the same roots as 3 l(y). Sl( 2 /) is a poly- 


nomial of degree n — 1 with the highest coefficient 1 . Consequently 

sr(cos ») - ^ . 

2 "-‘ sin 

This formula allows one to compute 21 (j^) quickly, examples are 
n - 3: 2r(y) = y* - i, 
n = 5: 2((y) == y* - |y* + 
n = 7: 2l(y) = + |y* - A:. 

The number of odd intervals, on which integrations must be carried out, 
is i(n — 1 ), but since those which are symmetric with respect to 0 require the 
same computations, only }(n — 1 ) or J(n + 1 ) must be considered. So there are 
1 , 1 , 2, • • • such intervals for n = 3, 5, 7, • • • respectively. The integrals are 
first elementary (arcsin), then elliptic, then hyperelliptic. 

Numerical computations for n = 3 are inunediate; for n = 5, 7 they have 
been carried out with considerable precision by B. I. Hart. 

At y = , ^yi(nZi)-i “(y) has a singularity of the type - _ -g' l (cf . the end 

d* 

of section 6 ), while all y—. w(y), A; = 0, 1, • • • , J(n — 1) — 2, are continuous. 
ay* 
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At y » Bi and £»-i , in particular, they are eero. Hence it follows by successive 

integrations that the order of vanishing of — taiy), A; = 0, 1, • • * , ~ 1 ) — 2 

ay* 

at y = and 5n_i is (i(n — 1) — 1) — A: — i = 5 — 2 — ifc. In particular 
for A: * 0 we find that at its maximum and at its minimum and , 

i.e. db cos the order of vanishing of <a{y) is g ““ 2.^* 

Since w(y) has this property, and since it is obviously an even function of y, 
R. H. Kent has suggested approximating it by a series expansion of the form 


JL / ^ \ln-2-fA 

(3) o)(y) = £ aAfcos* ~ - 2/ ; 

ft-o \ n / 

Computations by B. I. Hart, not yet published, have shown that even the use 
of the first four terms (/i = 0, 1, 2, 3, the an being determined by the condition 
of normalization and by the first three even moments of the actual distribution 
given in section 4) give excellent approximations. The use of the formula (3) 
suggests itself likewise for even values of n. 


7. Second special case. We consider now the second special case, where 
Bi ^ B 2 > Bz — Bi > • • • > Bm-i == Bm . This has no immediate bearing on 
our original problem (cf. the preceding section), but we shall nevertheless discuss 
it for the two following reasons. First, it is hoped that the reader will find an in- 
dependent interest in the simple and complete results which can be obtained in 
this case. Second, there are various modifications of our original problem, which 
lead to this case. For example let the Xi , • • • , x„ in our original problem, as 
described in section I, be complex numbers instead of real ones, replacing all 
squares by absolute value squares. Then one verifies easily that all character- 
istic values Xi , • • • , Xn-i are doubled, and so our first case goes over into our 
second case. (This amounts to replacing our quadratic forms by Hermitian 
forms, cf.^) It is easy to imagine two-dimensional problems where this set-up 
is natural. 


We put Cx = 52 x~i = Bax for X = 1, 


m 


, — , so that Cl > C* > 

2i 


> 


are the only restrictions imposed. 

Every y in y ^ , i.e. in Ci ^ y ^ , lies in an interval ^ y ^ 

Cx+i i.e. .Bax ^ y ^ Bax+i • That is the v of (2) is always even, and so , «(y) 

is zero in every one of these intervals. Therefore w(y) is a polynomial of degree 

^ — 2 in every one of these intervals. We have already shown that «(y) is 
2 


We omit the simple diseusaion of n 3, which must be excluded from this result. 
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not the same polynomial in each interval. Thus u{v) is represented by ^ — 1 

2 

Tfi Iff^ 

polynomials of degree - — 2 in the „ — 1 intervals 

Cl s 1/ ^ Ca , Cs ^ j/ ^ C, , • • • , Cim-i S y ^ C|m . 

We could try to obtain explicit expressions for these polynomials by a direct 
application of the results at the close of section 5- A characterization of the 
distribution can, however, be obtained in a more elegant way by an indirect 
procedure. 

Consider an arbitrary function i5(y). We wish to express its mean 


5(y) = f d(yMy)dy. 

•'Cjm 


If we can do this for all ^(y) then the distribution is completely characterized. 

fit 

We select first an - — 1-fold primitive function of t5(y), i.e. a function &(y) 
2 




Of course &(y) is determined only up to an additive polynomial of degree -s ~ 2 

2 

in y. 

Now the above expectation value becomes 


5 (i/) = f @(y)w(y) dy 

Jcimdy*” • 

|.Cx~0 .Jm-I 

= E / ®(.yMy) dy. 

x-i •'cx+i+0 


Since all 3 —. co(y). A; = 0, 1 , • • • , - — 2, are continuous from Cx+i + 0 to Cx — 0 

<* 2 

7ft 

for all X = 1, • • * , n ~ 1, we can evaluate each integral of the above sum by 
2 

Tft . • • 

- — 1 successive partial integrations. Thus the following expression obtains: 
2 

im—l ^|m— 2 vjm— *— 2 jk Vv^Cx— 0 




Considering the definition of ®(y) as an ^ — 1-fold primitive function, the 

2 

jk* ^ 

j-p ®(y)i fc' = 0, 1 , • • • , — 2, are everywhere continuous. This corresponds 

dy^ 2 
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to A' * ^ — 2, 4 = 0, 1, • • • , ^ — 2. Hence the first line can be rewritteo 

as 




-§ s 


CX-H> 


Id' r0*“' 

Xd^ Ip-Cx-o 


For Cx = Cl or Ci* the u(t/) at y = Ci + 0 or Cim — 0, respectively, must 
ajT 


obviously be taken to be zero.^ Owing to the results of section 5 all terms 

• ttt • Ifl 

with /? = 0, 1, “ • , ^ — 3 vanish, and the term with A; = — — 2 gives 
2 2 


- 1 

1 ^ 

\2 y 

n (Cx - Cx) n (Cx - Cx) 

M ik«X+l 




^ n(C*-Cx) ft (Cx-Cx) 

fc-1 MX+1 




The second line vanishes, since 




w(y) is zero everywhere, as observed above. 


Finally 

sw- 1 


n(Ct-cx) n (Cx- Cx) 


mcx). 


For 


we have 


m 


k^l 


-ft 


)b-X+l 


(z - Cx) 


dz 


«(z) 


-ft 


jfc-1 


(MX) (Cx — Ck) 


= n (Cx - Cx) ft (Cx - Cx). 

M ib-X+1 


Therefore the above formula can also be written 

®(Cx) 


(5(y) 


-(i-)'i 


T »(') 

dz ' M-ox 


Observe that the right-hand side of the above formula (which can also be 
easily expressed in terms of determinants) is a well-known approximate ex- 
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pression for 




&iy), as a (repeated) difference quotient of the values ®(Cx), 



mean of 


It is therefore very satisfactory that this expression gives the 


8(y) 




I ®iy)- 


Appendix. We return to the normal distribution of xj , • ■ • , x. as described 
in section 1, and to the quantiti es 8^ , ri given there. We denote means with 
respect to that distribution by (•••)• 

It was observed by B. I. Hart and mentioned by J. D. Williams* by com- 

5 * 

paring the known expressions for their moments, that every moment of 17 = — 
is the quotient of the corresponding moments of 6* and of That is 



(p = 0, 1, 2, ...). 


This indicates some kind of independence relation involving 6 ^ and The 
considerations which follow are intended to clarify this situation. 

The above relation may be written 

„ 2 |» P _ 2p p 
8 1? — ' « T? , 

or, more generally, 

We shall prove this by showing that 8 and 17 are statistically independent. 

We can, as in section 2 , make the mean f = 0, i.e. obtain the a:i , • • • , Xn 
distribution law 

And then, again as in section 2 , perform a linear orthogonal transformation, 
carrying Xi , • • • , Xn into, say x( , • • • , xl which leaves the distribution law in 
its original form 

c** *#**^*'* dxi • • • dxn , 


H n-l 

-SC 




n 


S -A.xi* 


« - 1 V « 


and makes 
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Since Xn does not occur in s*, i; we must use only the , • • • , xUi distribu- 
tion law 

c""’ e“i.Si *»*'*** dx[-’ - dxl-i . 

Now we introduce polar coordinates with respect to x( , • • • , Xn-i . Hese 
consist of a radius r with 

and n — 2 angular variables , ■ * ■ , Vn-i , which can be chosen in various 
ways, and which we need not describe more closely. At any rate 

dx[ •" dx'n-i = r*“* drv){<pt > * * • » <e>n-j) dtpi “• dtpn-t 

where we need not determine, the weight function C!onse- 

quently the distribution law is 

, iPn-i) (k>l ' ' ' d^»-t ■ 

Thus the coordinate r and the coordinates , • • • , are independent of each 
other. 

Next 



and 1 } is a homogeneous function of Xi , • • • , x'„-i of degree zero, i.e. it is inde- 
pendent of r. So 8 is a function of r alone, and i; is a function of , • • • , wn-t 
alone. Consequently s and ti likewise are independent. 

Added in proof : 

After this manuscript was completed. Dr. T. Koopmans informed the author 
of several results of his own, which he obtained in ecmnection with other statistical 
investigations. They have many points of contact with this investigation, and 
will appear in the near future in the Annala of Mathematical Statiatica. The 
author wishes to express his thanks to Dr . T. Koopmans for his communications. 



SOME EXAMPLES OF ASYMPTOTICALLY MOST POWERFUL TESTS 

Bt Abraham Wald' 

Columbia University 

1. latroduction. In a previous paper^ the author gave the definition of an 
asjunptotically most powerful test and has shown that the commonly used tests, 
based on the maximum likelihood estimate, are asymptotically most powerful. 

In this paper some further examples of asymptotically most powerful tests 
will be given. Let us first restate the definition of an asymptotically most 
powerful test, Let/(a:, 0) be the probability density of a variate x involving an 
unknown parameter 0. For testing the hypothesis ^ = So by means of n inde- 
pendent observations Xi , • • • , a;„ on a; we have to choose a region of rejection 
Wn in the n-dimensional sample space. Denote by P{Wn \ 0) the probability 
that the sample point E = (ri , • • • , x„) will fall in W« under the assumption 
that 0 is the true value of the parameter. For any region f7„ of the n-dimen- 
sional sample space denote by giUn) the greatest lower bound of P(f7« ] 0). 
For any pair of regions U» and T* denote by L{Un , Tn) the least upper bound of 

P(f/„ I 0) - P(r„ I 0). 

In all that follows we shall denote a region of the n-dimensional sample space 
by a capital letter with the subscript n. 

. Definition 1 : A sequence { W„1 (n = 1 , 2, • • • , ad inf.) of regions is said to be 
an asymptotically most powerful test of the hypothesis 0 = on the level of 
significance a if P(W» | flo) = « and if for any sequence {Z„} of regions for 
which P{Zn I flo) = a the inequality 

limsupLCZn, Wn) < 0 

n-^«o 

holds. 

Definition 2: A sequence { Wn} (n = 1, 2, • • • , ad inf.) of regions is said to 
be an asymptotically most powerful unbiased test qf the hypothesis 9 on 
the level of significance o if P(Wn | ^o) = lim g(Wn) = a, and if for any sequence 

n-»ee 

{Znj of regions for which P(Z„ | 0o) ■= lim g{Zn) == a, the inequality 

n—QO 

lim sup L(Z», Wn) 0 

n-*eo 

holds. 

'Research under a grant-in-aid of the Carnegie Corporation of New York. 

* “Asymptotically most powerful tests of statistical hypotheses,” Annals of Math. Slat. 
Vol. 12 (1941). 
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Consider the ejqeression 

(1) Vnie) = ZD 4 !<«/(*« . ®)- 

y/n «-i 

Let Wn be the region defined by the inequality y»(So) > cj, , d^ed by the 
inequality Pn(9o) < c" , and Wn defined by the inequality | yn(^o) I 5: c# , where 
the constants c, , c" and Cn are chosen such that 

P(Wi I Oo) * P{W': I 0o) - P{W» I tfo) - a. 

It will be shown in this paper that under certain restrictions on the probability 
density /(*, 8) the sequence { W^n} is an as 3 anptotically most powerful test of the 
hypothesis 0 = 9o if 9 takes only values > 0o . Similarly { W") is an asymptot- 
ically most powerful test if 8 takes only values < 0o. Finally {IT.} is an 
as}rmptotically most powerful unbiased test if 8 can take any real value. 

Another example of an asymptotically most powerful unbiased test of the 
hypothesis 8 = 0o , as it will be shown, is the critical r^on of type A in the 
Neyman-Pearson theory of testing hypotheses. This fact gives a strong justifi- 
cation for the use of the critical region of type A. 

2. Assumptions on the density function. Let w be a subset of the real axis. 
Denote by 8* a real variable which takes only values in u and let 0 be a variable 
which can take any real value. For any function we denote by Et^{x) the 
expected value of ^{x) under the assumption that 0 is the true value of the 
parameter, i.e. 

= j[ \K.x)f(,x,8) dx. 

For any for any positive 6 and for any real value 0i denote by (pi(x, $i , S) the 

greatest lower bound, and by ftix, 8i , j) the least upper bound of ^ log/(*, 0) 

00 * 

in the interval 0i — 0 < 0 < 0i -I- 0. In all that follows the symbol 8* , for 
any integer i, will denote a value of 8*, i.e., 0* is a point of <■>. 

We say that a value 0 lies in the eneighborhood of w if there exists a value 0* 
such that I 0 — 0* I < *. 

Throughout the paper the following assumptions on /(x, 0) will be noade: 
Assumption 1 : For any pair of sequences { 0«} and 1 0j ) (n = 1,2, • • • , ad inf.) 
for which 

^ Etn ^ log/(x, 0l) » 0 

also 

lim (0n — 0lt) “ 0. 
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it a bounded 

function of 0 and Oi, E$ — log f(x, Oi) it a continuout function of 6 and 0i and 

dd 

\ogf{x, 0 i) J — d{di) has a positive lower bounds where $i can take any value 
in the trneighborhood of 

Assumption 2: There exists a positive X;o such that £0,^1 (x, y i) and 
E$^<Pi{x, 01 y B) are uniformly continuous functions in the domain D defined as 
follows: the variables 61 and 0% may take any value in the k^j^neighborhood of w and B 
may take any value for which | ^ 1 < Ajo . Furthermore it is assumed that 

E,,{,Pi{x, 01 , «)]’, a = 1 , 2) 

are bounded functions of 0% , 0t and S in D. 

Assumption 3: There exists a positive ko such that 

C = £ ^tb.«)dx.O 

for aU 0 in the kv-neighborhood of u. 

Assumption 3 means simply that we may differentiate with respect to 0 under 
the integral sign. In fact, 

fix, 0 ) dx = 1, 

identically in 6. Hence 

^0 /„ ^ 

Differentiating under the integral sign we obtain the relations in Assumption 3. 
Assumption 4: There exists a positive Ao and a positive rj such that 

E.[pognz.e)J*' 

is a bounded function of 6 in the k^neighborhood of a?- 

3. Some propositions. Proposition 1 : To any positive there exists a posi- 
tive y such that 

Urn I VniO*) I > 7 I ») = 1 

uniformly in 0 * and for aU 0 for which \ 0 — 0 ^\ > 

Proof: From Assumption 1 it follows that Et ^ log/(x, 0 *) has a positive 




30 


Furthermore there exists a positive t such that E{ 
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lower bound in the domain \ 6 — 0* \ ^ /3. Since according to Assttmption 1 
Et log/(x, d*) j is a bounded function of 6 and $*, Proposition 1 easily follows. 
Proposition 2: There exists a positive t such that 
Urn P\yn(fi) < f 1 9] = I tf) 

n«*QO 

uniformly in t and for all 6 in the e-neighborhood of « where 

(2) die) ^ -E, log/(x, e) = Et log/(x, 9) J 

and 

(3) iV(<| 9) = f* 

V2ird(e) J- 

Proposition 2 follows easily from Assumptions 3 and 4 and the general limit 
theorems. 

Proposition 3 : There exists a positive e such that for any hounded sequence { Hn } 

lim [p r Vnie) < < 1 0 + -^1 - f* dNiv | 9)1 = 0 

n-oo I, L vnJ ; 

uniformly in t and for all 6 in the e-neighborhood 6f w. 

Proof: We have 


(4) 


Vnio + -^) = Vnie) + -^'- 4 = 2 ^t^OgfiXa,e'„) 

\ y/n/ \/ny/n ^ 

where lies in the interval ^ + From Assumption 2 and the above 

equation we easily obtain 


( 5 ) 


lim<P| 

ttMOO 


9 + -^ ) < < 1 9 + " 




y/n) y/n^ 

- p[i/n(9) - Pndie) <t\e + = 0 

uniformly in t and for all 9 in the c-neighborhood of «. From Proposition 2 
and (5) we get 

lim dNiv 1 9) - P [^y»(9) < t + M»d(9) 9 + ^ j| = 0 


or 


lim dNiv 1 9) - P[»«(9) < 1 1 9 + = 0 


( 6 ) 
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uniformly in t and for all 0 in the c-neighbourhood of m. This proves Propo- 
sition 3. 

Pboposition 4: There exists a positive t such that for any positive y and for 
any sequerice {jun} for which lim | /»• I 

Urn p|l |/,(0 i > 7 1 0* + = 1 

»»« L V wJ 

uniformly in 6*. 

Proof: If there exists a positive /3 such that > jS for almost all n, 

V w 

Proposition 4 follows from Proposition 1 . Hence we have to consider only the 
case lim = 0 . Since 

n-« V w 



we get from (4) 

Z Li^OgfiXa, e'n) 

( 7 ) Et»+(„jy/ii)\yn{0*)] + — = 0 . 

Mn 

Since lim — 7 = = 0 , we have on account of Assumption 2 
y/ n 

Z^log/(a;a ,0 2 

lim P».+(M,/VS) = E,.— log/(x, 0*) 

n-«e n au^ 

-P,.[^log/(x, 0 *)J = -d{0*) 

uniformly in 6*. According to Assumption 1 d{$*) has a positive lower bound; 
hence on account of lim | Mn | = <» we obtain from (7) 

( 8 ) lim 1 Eea+Q^^fy/^) yn{e*) | = « 

uniformly in 0*. The variance of yn(0*) is equal to the variance of — log/(x, 0*). 

od 

Q 

On account of Assumption 1 the variance of — log/(x, 0*) (under the assumption 

Ou, 

Mn 

that is the true value of the parameter) is a bounded function. Hence 

Proposition 4 is proved on account of ( 8 ). 

Pboposition 5: Let { Wni 0 *)] be a sequence of regions of site a, i,e. 
1 d*] = a, and let Vn( 0 *, y) be the region defined by the inequality 
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VniO*) < y. Let Ut,{9*, y) be the intereection cf Vni6*, y) and WniS^) «nd denote 
P[Un{0*, tf) I tf*] by Fn(y \ 9*). Denote furthermore 1 0*-+ 

0(6*, Ut U {<^n} ond {ju*} are tm eequencea such that lim d(0*) «■ d; 
lim Fniy [9*) »= F{y) and lim ahi * M then 




dPCy). 


lim n) => f e" 

n—to •Leo 

Proof: Let lim m«i = m and consider the Taylor expansion 

Z log/^x„, 9* + = Z log/(x., + :^ S ^ log/(a:«, «•) 


(9) 


where On lies in the interval 


From this we easily get on account 




of Assumption 2 and the fact that |Mn| is bounded 

( 10 ) 


n f (xa, 9* + 

log n -A- ^.yn/ ^ „) 

a«-l 


Kx., 9*) 
where for arbitrary positive ij 


( 11 ) 


limP 


{|*(«*, 


Mn 


n)l<.;10* + :;^y = l 


uniformly in 9*. Denote by P«(0*) the region defined by 

(12) ' I «(»*, n) I < i> > 0. 

On account of (11) we have 

(13) lim P [«„(«*) 1 9* + = 1, 

uniformly in 9*. Denote the intersection of P«(9*) and Trn(0 by Qn(9*), and 
the intersection of Rn(9*) and Un(9*, y) by r»(®*, y). Furthermore denote 
P[rn(®*, y) I 9*] by Pniy \ 9*). Then we have 


( 14 ) 


, \0*)<p ^Tn(.9*, t) I + :^] 
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for all values of t and 6*. Furthermore we obviously have 
(16) Um „n,n)-P [^Qn(fl*) 1 6* + = 0 

uniformly in 6*f and 

(16) lim[Pn{t\d*) -^Fn{t\e*)]^0 


uniformly in 6* and L Since ri may be chosen arbitrarily small, it follows from 
(14) and (15) that to any e > 0, ?; may be chosen such that 


(17) 


lim sup 


G{el , n) - r* dP„it I el) 


for any sequence jflj}. 

To each c let L, be a positive number such that L, depends only on e and 


(18) 


j[ dNit 1 e*) + J" '»<*•> dN(t 1 9*) < ^ 


for all n and for all values of 6*. Since d{8*) has a positive lower and a finite 
upper bound, it is easy to verify that such a Lt exists. From (18) and Proposi- 
tion 3 it follows 


(19) 


MmsapipfyM) < -L.\dt + -1^ 

L ■y/nJ 


’ I 


+ P I J/n(^ *) > L,\0t + 




for any arbitrary sequence { 6 *(. Since the difference Ut) — Un{8*, h) is 

a subset of the difference V„(6*, <*) — Vn(8*, h) and since TniO*, Ut) — T«(fl*, <0 
is a subset of U„{0*, <*) — Uni8*, <i) for U > U , we get from (18) and (19) 


( 20 ) 


lim sup Ip Un{el , —Lt) I e* 

n-+oo L 


* ^ Mn 


y/n 


l + PrTF„((?!)|e; + 

J L V«J 


- p[t7„((?;,L.)|o: + ^ 

lim_sup |p[7’„(o: , -Lt) I el + + p[Q,(e:) | ol + ^ 

for any sequence {oj}. On account of (14) we get from (21) 

( 22 ) c-’ lim sup I f dPnit \ $1) + f* dPnit \ «!)) < 5 . 

n-*90 J In J 2 


and 


( 21 ) 
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From (17) and (22) we obtain 


(23) lim sup 0(e*n , M« , n) - jf dPM | o!) | 

for any sequence Consider now the sequence {0*} which satisfies the 

conditions of Proposition 5. Since F„(< | B*) converges to F(t) uniformly in t, 
on account of (16) also F„(< | 0*) converges to F(t) uniformly in t. Hence we 
obtain from (23) 


(24) 


lim sup G(0* , nn , 



dF(() 



Since c and ri may be chosen arbitrarily small, Proposition 5 follows from (24). 


4. Some theorems and corollaries. Theorem 1. Denote by Sn{B*) the region 
defined by the inequality yn(B*) > A„{B*) where A«(0*) is chosen such that 
P[Sn(0*) I 0*] = a. For any region Tr«(0''‘) denote by Ln[PrB(0*)] the least upper 
bound of P[1F„(0''') I 0] — Pf/S„(0*) | 0] with respect to 0* and 0, where 0 is restricted 
to values > B*. Then for any sequence {TFb(0*)) for which P[1F„(0*) | 0"'] = a, 

lim sup L„[TT„(0*)] < 0. 

n -►«) 

Proof: Assume that Theorem 1 is not true. Then there exists a sequence 
of integers {n't, a sequence {0^'} and a sequence {0„'} (0„' > 0^') such that 

(25) lim {P[Tr„-(0;-) 1 0„-l - P[S„-(0:.) I 0„,1 1 = { > 0. 

n«a00 

On account of Proposition 2 and Assumption 2 the sequence (An'(0!l!')! is 
bounded. Then it follows easily from (26) and Proposition 4 (taking in account 

that Et ^ log fix, 0*) > 0 for 0 > 0* 
au 

'(26) (0,.. - elW'n' = M«' > 0 

must be bounded. Denote by |n"| a subsequence of {n't such that 

(27) lim d(0j<<) = d 

(28) lim un" - M, and 

(29) lim P,.-(< I 0;..) = Fit) 
uniformly in t where 

P„(t I 0*) = P[C7»(0* 0 I 0*] 

and UniB*, t) is the intersection of Tr,(0''O and the region yB(0*) < t. The exist- 
ence of a subsequence |n"t such that (29) holds follows from the fact that 

(30) P«(<, I 0*) - P,(<i I 0*) < $„(<, I 0*) - $„(<! I 0*) for <, ><i , 
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and 

(31) lim #«(< I = —^= f e~***'^ dv => N(t), 

where *„(< | d*) denotes the probability P]yJJB*) < t \ tf*]. Furthermore it can 
easily be shown that 

(32) r dF{t) » «. 

•i-oo 

On account of Proposition 6 we get from (25), (27), (28), (29), (30) and (31) 

(33) dFit) - J* dNit) = 5, 
where A denotes a value such that 

dN(t) = a. 

It has been shown in a previous paper* that (33) leads to a contradiction. Hence 
Theorem 1 is proved. 

Theorem 2: Denote by S„(6*) ike region defined by the inequality 2/»(0*) < 
An{0*) where .4„(tf*) is chosen such that P[<iSn(0*) | ^*] = «. For any region 
W„{6*) denote by Ln\W,siB*)] the least upper bound of 

P[Wn{e*) I B] - P\Sn{B*) I B] 

with respect to B* and B, where B is restricted to values < B*. Then for any sequence 
- (IFn(0*)} for which P[ir,(<?*) I 0*) = a, 

lim sup Ln[lF„(9*)] < 0. 

n-*ao 

The proof is omitted, since it is analogous to that of Theorem 1. ^ 

Theorem 3: Lei {TTnC^*)} a sequence of regions for which 

P[Wn{0^) I ^*] = a and lim g[Wn{9*)] = a uniformly in 6*. Denote by Ln[Wn{B*)] 

n^imVCl 

the least upper bound of 

P[1F»(<>*) I e] - P[\ y.(B*) I > A.{B*) I 0]. 
with respect to B and B*, where An(,B*) is chosen such that 
P[| Vnie*) I > A„iB*) 1 B*] = a. 

Then 


lim BupI/«[Trn(9*)] < 0. 


' See p. 12 of the paper cited in 
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Proof: Denote P[f/ii(0 < < | •*] by | 9*) and denote by 1 4*) the 
probaMity (under the hypothecs 6 = 0*) of the intersection of Tr«(S*) with 
the region {/»(0 < Assume that Theorem 3 is not true. Then there exists 
a subsequence {n''}, a sequence {tft"} and a sequence {9,"} such that 

limd(fll!»0 ■■ d] lim («„'» — “ limin'' = mJ 

n-ioo 

lim Fn"« 1 st") * Fit) 

uniformly in tj and 

(34) f^'°e"‘-*'‘’"dF(0 - f V‘-*'‘‘‘'dJNr(0 - rd“-‘'‘’‘'dAr«) = « 

J—to J—ao J A 


where A is a positive number such that 



Nit) = do. 

V2ird 


This can be proved in the same way as (33) has been proved. The author has 
shown in a previous paper^ that (34) leads to a contradiction. Hence Theorem 
3 is proved. 

Theorem 4: Denote by AniO*) the region of type^ A of size a for testing the hy- 
pothesis 6 = 8*. Denote by Bni9*) the region \ y^iB*) \ > CJfi*) where C»(tf*) 
is determined such that 


Fli ynie*) I > Cnie*) I 8*] = «. 


Then, under the assumption that E« 


[^log/(*, »•)] 


is bounded, 


lim {PlA„(fl*) I 8] - P[B„i8*) 1 0]} = 0 


uniformly in 8 and 8*. 

^oop: The region A„(fl*) is given by the inequality^ 

r s 1 

+ L iogfixa , 8*) > kUe*) [E ^ logAz. , 0 *) J + k':ie*), 

where k'„i8*) and knid*) are chosen such that A,(®*) should be unbiased and of 
size a. The inequality (35) can be written also in the form 

(36) [yn(fl*)]* + - Z lOgA®. , e*) > I'n i0*)Vni8*) + CiB*). 

n a ou* 


* See p. 14 of the paper cited in *. 

^Neyman, J. and Pearson, £. S., ** Contributions to the theory of testing statistical 
hypotheses,’’ Stat. Res, Mem., Vol. 1. 

• See the paper cited in ». 
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Let {;(,} be a bounded sequence. From Assumption 2 it follows that for any 
positive c 

(37) ^ ^ <*!»* + :^} = 1 

uniformly in 0*. Since (37) holds for arbitrarily small t, we get easily on ac- 
count of Proposition 3 

(38) lira |p |^A„«»*) 1 0* -h - p[^Al((9*) 1 0* + = 0 

uniformly in 0*, where A'„{0*) is defined by 

(39) [j/„(0*)r > ln(e*)yn{0*) + l':{0*) + d{0*). 

Since Ani0*) is unbiased and of size a, we have on account of (38) and (39) 

(40) lim ln(0*) = 0 and 

(41) lim l"(e*) + d(0*) = \(0*) > 0 


uniformly in 0*, where X(0*) is given by the condition 

(42) _J_ 

'\/2ird{0*) 

Inequality (39) is obviously equivalent to the simultaneous inequalities: 

yn{0*) < Cn{0*) and yn{0*) > c"{0*) 
where €„ {0*) and Cn{0*) are the roots of the equation in yn{0*) 

[yn{0*)f = U0*)y.{0*) + C{0*) + d{0*). 


limc!l(ff*) = — \/x(tf*) and limc"(fl*) = + \/x(tf*) 
uniformly in 0*, from Proposition 3 it follows that 

lim |p|^A„(«*) I 0* -I- 




dN(t I «*) - f dNit I fl*) > = 0 


uniformly in 0*. 


Now let us consider a sequence {vn} such that lim | 1 = « and lim — ^ = 0. 

Vn 
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We shall prove that 

(44) + 1 

uniformly in e*. Since Et log f(x, tf*) j is assumed to be bounded, 

(45) 

lo8/(*. ®*) J 

and 

(46) ^..-K../v^r,[|jog/CT,tf*)J 

are bounded functions of 6* and n. We get by Taylor expansion 

S ? los /fa. .»•) - E ^ log /(».,«• + ^) 

(47) , 

;;^E|;iogA».,s:> 

where st lies in (9*, d* + . Hence 

L vnJ f 

(48) E9*M>>J^)[yn{d'^)] = - I'n Ee*+(pjy/^) ^ log f{Xa , j . 

From Agssumption 2 and Urn | Vn | “ <» it follows that the absolute value of 
the right hand side of (48) converges to <x> . Hence 

lim I E9*+,jy/^[yn(S*)] | = oo. 

Since on account of Assumption 1 

l0g/(Xa, ^*)J 

is a bounded function of n and 0*j also the variance of yn{0*) (under the assump- 
tion that e — 6* + VnJ\/n is the true value of the parameter) is a bounded 
function of n and Hence for any arbitrary large constant C 

(49) lim P j^l Vnie*) 1 > C I fl* + = 1, 

uniformly in d*. The equation (44) follows easily from (36), (40), (41), (46), 
(46) and (49). 
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Consider a sequence {p*} such that > j8 > 0 for all n. Then it follows 
easily from Proposition 1 that for any arbitrary C 

(60) Um P j^l vJfi*) I > C I e* + j 

uniformly in 6*. Since log /(®« > ^*)J is assumed to be bounded, and 
therefore also E) — log /(», 0*) is bounded, there exists a finite g such that 


(51) 




uniformly in 6*. From (36), (40), (41), (50) and (51) it follows 
(52) limp|^^„(»*)|0* + _^] = 1 


uniformly in 0*. Since on account of Propositions 3 and 4, the relations (43), 
(44) and (52) hold if we substitute B„{6*) for Theorem 4 is proved. 

If Assumptions 1-4 are fulfilled for the set u consisting of the single point 
6 6o, then we get from Theorems 1-4 the following corollaries: 

Corollary 1 : Let W'„ be the region defined by the inequality yn(Bo) > Cn , 
W'i defined by the inequality yniBo) < c, , and W„ defined by the inequality 
J yJfi^ I > Cn , where the constants , c« and c„ are chosen such that 

P(lf : I do) = P{W': I <?o) = PiW„ I Oo) = a. 

Then { W'n } is an asymptotically most powerful test of the hypothesis 6 = Bo if 6 
takes only values > Bo . Similarly [W^] is an asymptotically most powerful test 
if B takes only values < Bo . Finally (ITn) is an asymptotically most powerful 
unbiased test if B can take any real value. 

Corollary 2: The sequence {An(5o)} is an asymptotically most powerful un- 
biased test of the hypothesis B = Bo, where A„(eo) denotes the critical region of 
type A for testing B = Bo. 



ON THE DISTRIBUTION OF THE QUOTIENT OF TWO CHANCE 

VARIABLES 

By J. H. CuRTisB 

Cornell University 

1. Introduction. Although the quotient of two chance variables appears fre- 
quently in mathematical statistics, the methods used in the literature to derive 
the distributions of quotients have usually been special ones devised for the 
particular variables under consideration, and in no way indicative of the general 
result. It is the purpose of this paper to study the distribution of the quotient 
of two variables for itself alone, with attention first to the question of existence, 
and then to the accurate derivation of a number of general formulas for the 
frequency function and d.f.^ The principal formulas which we shall derive may 
be described briefly as follows (the numerals refer to the equation numbers in 
the text) : 

(3.1) . The frequency function of the quotient of two variables which have an 
absolutely continuous joint probability function. 

(4.11), (4.12). The d.f. of the quotient of a pair of arbitrary independent 
variables, expressed in terms of the d.f.'s of these variables. 

(5.2) . The d.f. of the quotient of a pair of arbitrary independent variables, 
expressed in terms of the c.f.^s^ of these variables. 

(6.4). The limiting form of the d.f. of a quotient of two sums of arbitrary 
identical independent variables. 

(7.1). A formula analogous to (3.1) for the product of two chance variables. 

(7..2). A formula analogous to (4.11) for the product of two chance variables. 

2. The existence of the quotient distribution. The function Z == X/Y is a 
continuous function of X and F, finite and uniquely defined for all points 
(X, Y) such that Y ^ Q. Therefore if = 0} := 0, the pr.f.* P{S) of the 
joint distribution of X and Y determines a probability distribution for Z (see 
[1, pp. 12-13]). To avoid irrelevant difficulties, we shall assume in the sequel 
that P{F = 0} = 0 unless definite statement is made to the contrary. This 
assumption involves no real restriction on our work, for in situations in which, 
a priori, the assumption is not fulfilled, we can always replace the distribution 

^ I.e., distribution function. The underlying axioms, terminology, and abbreviations 
in this paper are uniform with those of Cramer’s book [1]. For the definition of d.f., see 
[l,p. 111. 

* I.e., characteristic functions. See [1, p. 23]. 

* I.e., probability function; [1, d. 9]. 
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of Y by the conditional distribution of Y relative to the hypothesis that Y 0. 
In such cases, then, the distribution of Z which we are about to study is to be 
interpreted as a conditional distribution relative to this hypothesis. 

We shall suppose that the space of X is the x-axis, that of Y, the y-axis, and 
that of Z, the z-axis. It is quite readily seen that the set of points in the (x, y) 
plane which corresponds to the set Z ^ 2 consists of 

(i) the infinite region* in the upper half-plane which is bounded by the nega^ 
tive X axis and by the line x = zy, 

(ii) the infinite region in the lower half-plane bounded by the positive x-axis 
and the line x = zy; 

(iii) the line x = zy except for the origin. 

Denoting this set by S, , we have ' 

Hiz) = / dP{S) = P(S.), 

where H(z) is the di. of Z. The present paper, from the viewpoint of analysis, 
is simply a study of the Lebesgue-Stieltjes integral appearing in this equation. 


3. The continuous case. Suppose first that P(S) is absolutely continuous. 
This means that the joint distribution of X and Y has a frequency function 
y)> which is defined almost everywhere, is non-negative, and has the prop- 
erty that P{S) = / <p{Xj y) dx dy. In general, this integral must be taken in 

the Lebesgue sense, but of course if the discontinuities of ^ form a set of two- 
dimensional measure zero, and if the Jordan content of any bounded portion of 
the boundary of S is zero, then this integral is just an ordinary improper double 
Riemann integral.® In particular, these conditions are fulfilled if (p is continuous 
everywhere and if S = Sz . 

The transformation x = uv, y — v, gives a continuous one-to-one map of 5, 
onto a set of the (u, v) plane which consists of the closed half-plane lying to 
the left of the line u — z, but with the t^-axis deleted. The Jacobian of the 
transformation has the absolute value \v\. By the theorem for change of 
variables in Lebesgue integrals [4, pp. 653-655], we have 

H(z) == / ip(xj y)dxdy = \ j y j ip{uVy v) du.dv, 

Jsz J^z 


By Fubini^s Theorem [6, pp. 203-208], the last integral can be expressed as a 
repeated integral. Integrating first with respect to v, we obtain this result 
Theorem 3.1: If the joint variable (X, Y) has the frequency function <p(Xf y), 
then 



♦ I.e., open connected set. 
»See [4, pp. 476-478: p.676]. 
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and conaeguentty H(z) is an absolutely continuous function of z. The frequency 
function of the distribution of Z exists almost everywhere, and is given by the for- 
mula 

M 

(8.1) h{z) =‘*F'{z) = £ I » I ^(zv, v) do. 

We remark that if X and Y are independent, so that <p{x, y) = f(x)’giy), 
where / and g are respectively the frequency functions of X and Y, then (3.1) 
may be written in the form 

M -[ iTO 

(3.2) A(z) * 1 1 ) I fizv)giv) dv. 

This case was considered recently by Huntington [6], with the additional restric- 
tions that giy) = 0, y < 0, and that/(a;) and giy) be continuous. 

All the familiar special quotient distributions of applied mathematical sta- 
tistics, such as Student’s t and Fisher’s z, may conveniently and rigorously be 
derived by means of (3.1) and (3.2); in each case the required result follows 
immediately after an obvious change of variables in the integrand. We pause 
here only to point out explicitly the result obtained when X and Y have a normal 
joint distribution with variances ex , <rl , and correlation coefficient p. If the 
means E{X) and E{Y) are not equal to zero, it is apparently impossible to 
evaluate (3.1) in closed form; this case has been studied in some detail by 
Geary [3] and by Fieller [2]. But if E{X) = E{Y) = 0, then 



which is the frequency function of a Cauchy distribution with mode at the 
point z — p(Tx/<TY , the value of the regression coefficient of X on Y. If X and Y 
are independent, then p = 0, and the frequency function becomes 


(3.3) 


h(z) - 


Cx cry 
IT 



4. The quotient of two arbitraiy independent variables. We shall hence- 
forth drop the restriction that P{S) be absolutely continuous, but shall suppose 
instead that X and Y are independent chance variables with one-dimensional 
distributions of the most general type, except that the distribution of Y will be 
subject to the restriction that P{Y = 0 j =0. 

We denote the d.f. of X by f\x), that of Y by G(y), and, as usual, that of Z 
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by H{z). It is to be noticed that the conctition P[Y — 0} » 0 implies that 
(?(y) is continuous at the point y => 0. Let 

/«) - 

(4.1) g^(t) • d6{y) 

g-(t) = f dJGiy). 

Clearly 

(4.2) H{z) = P\X - 2F ^ 0; y > 0} + P(X - zF ^ 0; F < 0}. 

We introduce the functions 


(4.3) 


ri(«) = P{X - zF ^ F > 0} = [1 - G(0)]-P{X -zY^u\Y> 0},' 

7i(0 = £*e““dri(w), 

rs(M) = P{ 2 F - X g m; F < 0} = (?(0)-P{2F - X ^ « I f < 0}, 

7i(0 = c‘'“ drj(w), 


r(«) = ri(u) + r,(«) 



dT{u) = 7i(0 + 7s(<). 


By (4.2) and (4.3), 


(4.4) H{z) = r(0). 

We shall now evaluate ri(u) and r 2 (w) in terms of F{x) and Giy), and also 
7 i (0 and 7 *(f) in terms of fit), g'^'it), and g~it). 

Ijct us assume for a moment that P{F > 0| 0; that is, that G(0) < 1. 

The conditional distribution of F relative to the hypothesis that F > 0 then 
has the d.f. 


(4.5) 


Giiy) 


Giy) - GiO) 

1 - GiO) ’ 

P. 


y^o, 

y <0. 


The d.f. of —zY relative to this hypothesis is Gii—y/z) if 2 < 0, and 
1 - Gi[i-y/z) - 0] if 2 > 0. 


• By PiA 
potheBifl h. 


I h) is meant the conditional probability of the event A relative to the hy- 
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It is well known that the corresponding d.f. of the sum X + (— sF) is ,^en 
by a convolution of the d.f.’s of X and In the present case, tlus result 

t^es the form 


(4.6) F{X-«r^w|r>0} 


Fiu - d) dGi 0 , z < 0, 


z>0. 


Referring to the definition of these Lebesgue-Stieltjes integrals [4, pp. 662-663], 
we see that the change of variables w = — r/z yields the equations 


(4.7) P{X - zY ^u\Y > 0] 



F(u + zw) dGiiw), z < 0, 

F(u + zw) dGiiw — 0), z > 0. 


Now the definition of the variation of Gi(y) [4, pp. 341-342] used in forming 
these Lebesgue-Stieltje.s integrals makes no distinction between the variation of 
Gi{y) and that of Gi{y — 0) over any bounded set contained in an interval of 
integration a < y < « , provided that Giiy) is continuous at a in the twonsided 
sense. Since Gi{y) is continuous at j/ = 0 in this sense, it is possible to replace 
Gi{w — 0) by Gi{w) in the second of the two integrals in (4.7). 

Equation (4.7) is clearly true for z = 0 as well as for all other values of z. 
Referring to (4.5) and (4.3), we see that 

ri(«) = F(u + zw)dGiw), allz. 


The c.f. of the convolution (4.6) is the product of the c.f.’8 of X and of the 
conditional distribution of —zY [1, p. 36]. This product is f{t)' j c”****' dGi{y). 
Thus by (4.6), (4.3), and (4.1), 

(4.8) yi(t) = [1 - G(0)][/(i)-j[" e-‘“»dGi(y)] = fiOg^i-tz). 

We have established (4.7) and (4.8) under the condition that P{Y >0} 0. 

However, it is obvious that they are trivially true if P{ F > 0} = 0. 

We turn now to r*(M). Supposing that P[Y < 0) 0, the conditional 

distribution of Y relative to the hypothesis that F < 0, has the d.f. 


Gtiy) 


(?( 0 )’ 


y <0, 


1 , 1 /^ 0 . 


» See [1, pp. 36-36]; also [71. 
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The conditional distribution of tY has the d.f. Ot(y/t) for t > 0, and 
1 — Oi[(,y/z) — 0] for z < 0. The d.f. of —X is 1 — F(—x — 0). Thus 

P{zY - X ^u\Y <0] = 

-F[-(w-i;) -0)}d[l *<0, 

ly {1 - Fl-(u -v)- 0]) * > 0, 

= 1 — £ F{tw — u — 0) dG%{w). 


Evidently the first and last members of this equation are equal for z = 0 as well 
as for all other values of z. From (4.3) we obtain 

Ttiu) — G{0) — f F(zw — u — 0) dG(tv), all z. 

J-oo 

Also, as before, 

72(0 = f(-0g^(^0- 

Obviously, the last two equations are still true if P{F <0} =0. 

To summarize, we have shown that 

(4.9) r(w) = G(0) 4- j F{u + zw) dG(w) — F(zw — w — 0) dG{w), allz; 

(4.10) 7(0 = fiOg^i-zO + fi-Og'izO- 

Referring now to (4.4) and letting te = 0 in (4.9), we are able to state the 
following theorem: 

Theorem 4.1: If X and Y are independent chance variables with respective 
df.^8 F(x) and G(y)^ the d.f. of the quotient X/Y is given by th^ formula 

(4.11) H{z) - G(0) + f F(zw)dG(w) - f' F(zw - 0)dG(w) 

Jo J^oo 


for all values of z. 

We shall not attempt to make a careful study of the above formula, such as 
the studies which certain writers have made of convolutions. However, it does 
seem desirable to place on record here certain remarks concerning it of a more 
or less superficial character. For convenience in later reference, we state these 
remarks in the form of four lemmas. 


Lemma 4.1: Let Mi be the set of aU values of z such that if z t Mi , the set of 
discontinuity points of F(zw) on the vhoxis has a point in common with the 

point spectrum of Giw). Then if z e C(ilfi),® the integrals j F{zw ± 0) dO(w), 


* By (7 (Ml) we mean the complement of Mi with respect to the ^z^-axis. 
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F{m) ± 0) dO{v)), 


are Riemann^StieUjee integrals and eonaeguen&y the inte- 


grands can be replaced by F(gw) unthout altering the valves of the integrals. 

The lemma follows immediately from the definitions of Riemann-Stieltjes fmd 
Lebesgue^tieltjes integrals. 

Lemma 4.2: The set Mi is denumerable. 

The proof can easily be supplied by the reader. 

Lemma 4.3 : Let Mt be the set of all values of t such that if z t Mt , r(u) is discon- 
tinuous o< u = 0. Then Mt CZ Mi. 

To prove this statement, we first observe that r(u) is a genuine d.f. [1, p. 11]. 
For obviously r(~ «) = 0, ?(+«) = l, and since ri(u) and TjCm) are both 
products of d.f.’s into constants, these two functions, and therefore r(M), must 
be continuous from the right. It is this last property of r(tt) which is needed 
for our present purposes; in particular, we have the relation limu-.+or(w) = 
r(0) = H{z). On the other hand, by the general convergence theorem for 
Lebesgue-Stieltjes integrals [4, pp. 663-664], we have 


limu — 0 r(w) = 6r(0) + f F(zw — 0)dG(w) — f F(zw)dG(w). 

Jo J -«0 


If z be chosen so that this integral and the ones in (4.11) are all Riemann- 
Stieltjes integrals, the expression (zw — 0), wherever it appears, may be replaced 
by zw without changing the values of the integrals. Thus for such a value of z, 
r(+0) = r(— 0). According to Lemma 4.1, we can be sure that at least if 
z € the integrals here will be Riemann-Stieltjes integrals, so our proposi- 

tion is proved. 

Since H(Zi + 0) is equal to r(+0) with z = Zi , and H(zi — 0) is equal to 
r(— 0) with z = Zi ^ we have the following result: 

Lemma 4.4: The set is the set of discontinuity points of H{z), 

By using the alternate form of the convolutions used to derive (4.9), we obtain 
a representation of r(w) somewhat more complicated than that appearing in 
(4.9). The corresponding formula for H{z) is as follows: 

G(0)[1 - F(-0)] - ^(^^(O) + f G(^^dF(v) 

(4.12) Hit) = F(0)[1 - G(0)] + G(0)ll - F(-0)], « = 0; 

1 + G(0)[1 - F(-0)] - G(0)F(0) + f G^^dFiv - 0) 

“ *> 0 . 
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5. Representation of H{z) by chaiacterlstk functions.. A simple algebraic 
formula connecting the c.f. of Z with those of X and Y is not available. How- 
ever, there exists an interesting representation of H{z) in terms of the functions 
/(0> g^{t). The result may be stated as follows: 

Theorem 5.1:® Let the distributions of the independent variables X and Y have 
finite first absolute moments^ and let the integral 

(6.1) ^ j/(0g'^(-g0 +f(-t)g izt)\ ^ 


be finite for each value of z. 

+r)r"r 


Let A(u) be any dj, with a finite first absolute moment, 
dt he finite, where 6(0 is the cf, of A(w). Then 


(5.2) H(z) = A(0) 


- 1 
2iri JLo( 


+f(-t)g (zt) - d(t) 


dt. 


If the integral obtained by formal differentiation under the integral sign with 
respect to z in (5.2) is uniformly convergent in a certain interval I, then the 
frequency function h{z) of the distribution of z exists in that interval and is given 
by the formula 

Hz) = ^ £" - fi-t)g-'(zt)] dt, zel. 


We remark that the condition (5.1) will be satisfied for all values of z if f{t) 
alone satisfies a similar condition, inasmuch as | ^"^(0 | ^ 1, I g (t) 1^1- 
Important special cases of the theorem arise when A{u) is replaced by F{u) or 
{?(w), and when A(u) is so chosen that A(0) = 0. 

Our proof of the theorem will depend on a rather general result due to Cram4r 
[1, Theorem 12], which we shall restate here in the special form applicable to the 
problem at hand. 

Lemma 5.1: Let R{u) be a function of bounded variation over the infinite 
interval — oo < u < let lim R{u) = lim R{u) = 0, and let r{t) = 

£" dR{u). If (a) \u\ dR{u) and (b) (/.♦ni ^ dt, both are 

finite, then for every value of u, 


R{u) = 


2Tri Lzo t 


,-itu 


dt. 


To prove Theorem 5.1, we observe that since r(w) is a d.f. (see proof of Lemma 
4.3), the difference r(u) — A(u) is a function similar to the function R(u) of the 
lemma. If we do let R(u) = r(w) — A{u), it follows at once that r(t) = y{t) — 
HO • ** + /(“■%”(^0 “**• HO- If we can verify that this ii(w) 


• The theorem is due to Cram6r in the case in which G(0) • 0, and A(u) m G(u). See 

[1, Theorem 16]. 
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satisfies conditioDs (a) and (b) of the lenmxa, then we shall h4ve estisblidied the 
relation, 

r(„) . 4(u) - > . r/ «y(-«<)+/(-«).-(«o-««) 

2m sLao t 

for all values of u, wd letting u == 0 in this equation, we shall obtain (5.2). 

Condition (b) in the lemma is taken care of by (6.1) and the condition on 4(0 
in Theorem 6.1. Clearly condition (a) will be satisfied if it turns out that r(M) 
has a finite first absolute moment. Now the existence of finite first absolute 
moments of X and Y will insure the existence of finite first absolute moments 
for the conditional distributions involved in the definitions of ri(u) and r 2 (u), 
because E\X — zY\^E\X\-^\z\E\Y\. It follows quite readily from 
this that the first absolute moment of r(tt) is finite. The proof of the theorem 
is complete. 

6. Distributions of variable form. We consider now the case in which the 
distributions of the numerator and denominator approach limiting forms. 

Theorem 6.1: Let the independent variables Xa and Y^ have respective d^.'s 
Fa(x) and 0»{y) which depend upon the two parameters a and /9. Let be 

the d.f. of the quotient Za,s = XJYf . If there exist two chance variables X and Y 
with respective distribution functions F{x) and G{y) such that lim Faix) = F{x) 

a-* 00 

at all points of continuity of F{x), and lim Ofi(y) *= G(y), at all points of con- 
tinuity of G{y), then 

(6.1) lim Hafiiz) = lim lim Ha.s{z) — lim lim H„,fi{.z) = H(z) 

a-*oo a-*oo /J-^oo O“*oo 

/?-*oo 

at all points of continuity of H{z), where H(z) is the d.f. of the variable X/Y. The 
double limit in (6.1) is uniform in any finite or infinite interval of continuity 
of Hiz). 

In the interpretation of the limits involved in this theorem, it is to be under- 
stood that in the hypotheses, a may tend to infinity over any unbounded set 
Ta of the «-axi8, and j3 may tend to infinity over any unbounded set Tg of the 
iS-axis, provided that in (6.1), a and are restricted so that a tTo, and e Tg . 

To prove the theorem, we introduce functions fait), ggit), gjit), r«.p(«), 
ya,gii)> which are defined by equations (4.1) and (4.3) with F, G, X, Y replaced 
respectively by Fa , Gg , Xa , Yg . On the other hand, with r^erence to the 
distributions of X and Y, we Smploy the notation of section 4 without modifica- 
tion. According to the work in that section, r(u) is given by (4.9) and its c.f. 
fit) is given by (4.10). Also, 

y^Ai) Uit)fiti-fd) + fai-t)ggizt). 

But it is an immediaite consequence of our hypotheses that lim fait) <b fit), 



418 


J. H. CUBTIBS 


lun Ofit) = aiid liin ff 0 (0 of the limits being unifonn in any 

/5-*40 /J-+ao 

finite interval of values of Thus 

(6.2) lim 7 «,,(j (0 = lim lim 7 a,o (0 = lim lim 7o.(j(0 = yit), 

a-*oo a-*09 /3-*oo /J-*oo a-*oo 

j9-*oo 

uniformly in any finite interval on the ^-axis. 

Consider the extreme members of (6.2). It follows immediately from a well- 
known general theorem^ that lim Pajiu) = r(w) at all continuity points of 

0(-*«o,3-*ao 

r(M). Then since Ha.»iz) = ra.fl(0) and Hiz) = r(0), we find that 

lim = H{z), ^ ^ 

where Mi is the set defined in Lemma 4.3. By Lemma 4.4, the set Mi is the 
set of discontinuity points of H(z), so the equality of the first and last members 
of (6.1) is established at all continuity points of H(z). The uniformity of the 
limit is due to a general property of convergent sequences of d.f.^s; see [1, p. 31]. 

The existence and equivalence to H{z) of each of the iterated limits in (6.1) 
may be established by two consecutive applications of the foregoing argument, 
and by the use of (6.2). We leave the details to the reader. 

It is to be remarked that both Ha,^{z) and H{z) can be represented by (4.11), 
provided, of course, that F and G in (4.11) are replaced by Fa and in the 
case of Ha,» ; thus our theorem essentially states that the order of the double 
limit and the integration is immaterial in this formula. A similar remark 
applies to formula (5.2). 

The reader is reminded that we have tacitly been assuming that the d.f. of 
any variable appearing in a denominator is continuous at the origin. In case 
Ofiiy) does not satisfy this condition, but G{y) does satisfy it, and if, as suggested 
in section 2, we consider to be the d.f. of the conditional distribution of 

Za ,/9 relative to the hypothesis that lA 0, then it can be shown rather easily 
that Theorem 6.1 remains true with this modified interpretation. But if G{y) 
is discontinuous at the origin, and if H{z) is interpreted as the d.f. of the condi- 
tional distribution, then (6.1) may be no longer true, as can be shown by trivial 
examples. 

Perhaps the most important cases of variable distributions arise in the con- 
sideration of sums of independent chance variables. We accordingly present the 
following synthesis of Theorem 6.1 and a simple case of the Central Limit 
Theorem. 

Theorem 6.2: Let , f/2 , • * • , 5c a sequence of identically distributed chance 
variables, each with mean zero and {finite) standard deviation <ru , and let Vi , 

See [1, p. 30]. 

See [1, Theorem 11]. The result needed here is a trivial extension of the theorem 
cited. 
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Vt , - ' • , be a sequence of idenMcaUy distribtiied chance variables, each with mean 
zero and (finite) standard deviation ay . Furthermore, let the variables Ui and Vj 
be dll independent, i = 1, 2, • • • ,j = 1,2, . If m and n tend to infinity in 

such a way that 



then the d.f. of the conditional distribution of the variable 

^ _ C/i + t7,+ + 

... +F„’ 


relative to the hypothesis that the denominator is different from zero, tends uniformly 
to the function 


(6.4) 


JLioo 


k<rv<Tv 


2 j2 .]2 I i 

(Ty K V, “T 


For if we let 

f/l + t/2 + • • • + Um 
auy/ni 

then TFm.n = '\/mln{(Tul(rv)Zm,n . The Central Limit Theorem [1, Theorem 20] 
states that the d.f/s of the numerator and denominator of Zm,n each tend to the 

function f dt, which is the d.f. of a normal distribution with 

•*-00 

mean zero and variance one. By (3.3), the quotient of two variables, each of 
which has this d.f., has the continuous d.f. H(z) = / (l/ir)[l/(l + a:*)] dx. 

•Leo 

If we let Hm,n(z) denote the d.f. of the conditional distribution of Zm,„ , relative 
to the hypothesis that the denominator of Zm.n is different from zero, then by 
Theorem 6.1, Um Hm,ni<i) = H(z) uniformly in z. Now the d.f. of the 

m-*oo,n- 

conditional distribution of W„,n is H„,n[\/n/m{av/au)w], and because of (6.3) 
and the uniformity of the Umit of this approaches H[k(av/ttv)w]. 

Differentiating the last expression with respect to w, we find that the resulting 
frequency function is equal to J'{w)\ and this concludes the proof. 

As an appUcation of the theorem, let us consider the following problem. 
From an um containing white and black balls in the proportion of p to 1 — p, 
we shall make 100 random drawings of a single ball with replacement after each 
drawing. Let Trjo. 5 o be the ratio of the deviation of the number of white balls 
in the first 50 drawings from the expected number, to the deviation of the number 
of white balls in the second 50 drawings from the expected number. What is 
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the approximate value of w for which P|lFeo« ^ to | 6} = .06, where the 
hypothesis b is that the denominator of Trio,<o shall be different from zero?^ 
To answer this question, we observe that the numerator and denominator of 
Wk,w can each be expressed as the sum of 50 independent identical chance 
variables, each with mean zero and with variance p(l — p). Thus according 
to Theorem 6.2, the approximate d.f. of W(o,m is 

J(w) * f - V- = s + - arctan w, 

J-» T 1 + W* 2 IT 

and the required value of w satisfies the e*quation J(») — Jiw) — .05. The 
solution of this equation (correct to one decimal place) is tc = 6.3. 

It is perhaps needless to remark that a study of the error involved in sup- 
posing J (w) to be the d.f. of Wm.n in Theorem 6.2, must necessarily precede the 
unreserved acceptance of numerical r«3ults obtained by means of that theorem. 


7. Products of chance variables. We conclude this paper with a rather brief 
treatment of the distribution of the product of two chance variables. To pre- 
serve a notation uniform with that of the preceding sections, we shall write the 
product as X = YZ, where the d.f.’s of X, Y, and Z are to be denoted, as before, 
by F{x), 0{y), and //(z), respectively. The existence of F{x) is readily proved 
by the methods of section 2. The assumption that P\Y = 0) = 0 is of course 
unnecessary here, and will be dropped in this section. 

In the continuous case, an argument similar to the one employed in section 3 
will establish the following result: 

Theorem 7.1 : If the joint variable {Y, Z) has the frequency function ^{y, z), 
then 


F(x) = 




and consequently F{x) is an absolutely continuous function of x. The frequency 
function of the distribution of X exists almost everywhere, and is given by the formula 


(7.1) 


/w-F'w.£’|5|*(5,.)d„.£ 



In the discontinuous case, with Y and Z independent, we can write X ^ 
ZY ■= Z/(1IY) and use Theorem 4.1 to derive a formula for Fix). We have: 

Fix) = P{X ^x] ^ P{Y 0}P{X ^x\Y 9^0] + P{X S F “ 0). 


» This hypothesis would always be fulfilled in case fiOp is not an integer. 
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Excluding for a moment the trivial case in which 0) « 0, let b;e 

the d.f. of the conditional distribution of (1/F) relative to the hypothesis that 
Then 


(?(-0) + 1-G 


a-«)- 


y>0. 


P{Y pi,0\Gi(3f) G{-0), 

o(-o)-o(l-o), 


y - 0, 

y < 0. 


It is to be observed that (?i(y) is continuous at y = 0. Using Theorem 4.1, we 
find that 


P{X ^ as 1 K 0} = GiiO) + H(xw) dGi(.w) - H(xw - 0) dGiiw). 
So 


P{Ff^ 0}f{X gxl ¥ 9 ^ 0 } 


- «(-»> + (s - ")] - ir®'*" - ■ “)] 

= G(- 0 ) + £ - /7h(£ -o)«w. 

This equation is trivially true if P{F 0} = 0. Also, 


P{A g®; F = 0} 


0 , 

,0(0) - (?(-0), 


® < 0 , 
X g 0. 


Thus we obtain the following theorem: 

Theorem 7.2: If Y and Z are independent chance variables with respective df.’s 
G{y) and H{z), then the df. of their product is given by the formula 


(7.2) 


Fix) = r H 

J04fl 


dG(v) - -‘ O^dGiv) 


Oi-0), 

0(0), 


X < 0 , 

x^O, 


for all valves of x. 
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SOME GENERALIZATIONS OF THE LOGARITHMIC MEAN AND OF 
SIMILAR MEANS OF TWO VARIATES WHICH BECOME 
INDETERMINATE WHEN THE TWO VARIATES ARE EQUAL 

Bt Edwaed L. Dodd 
University of Texas 

1. Introduction. The logarithmic mean m of positive numbers, x and y, as 
given by 

/jN ^ ^ V-x ^ y-x 

log, y - log, X log. {y/x) 

is of considerable importance in problems* relating to the flow of heat. 

The logarithmic mean arises, moreover, in less technical problems such as the 
following: Given that incomes t in the interval, x ^ i ^ y, are distributed with 
frequency inversely proportional to t. That is, with k = a positive constant, 

(2) 4>it) dt = {k/t) di 

is the number of individuals with incomes lying between t and t + di. Then, 
with a: > 0, the total number / of individual incomes is 

(3) f ^ f dt = fc(log y - log x). 

The combined income g of the group is 

(4) g = f t<f>{l)dt - k{y - x). 

J X 

And thus the logarithmic mean g/f of the two numbers x and 2 / in (1) is the 
arithmetic mean of all the incomes; that is, the average income — ^at least to a 
close approximation if the group is large enough that integration may replace 
summation. 

Now m in (1) becomes indeterminate, if x = Nevertheless, if c > 0, and 
c and y c, then m--^ c. Thus, we may properly speak of m as a mean of 
these two variates, x and y. 

This logarithmic mean is one of a set of means studied by Renzo Cisbani*, the 
general form being 

' See Walker, Lewis, and McAdams, Principles of Chemical Engineering, McGraw Hill & 
Co., Part IV, Logarithmic mean temperature difference. 

> R. Cisbani, ^^Contributi alia teoria delle medie.” Metron, Vol. 13(1938), pp. 23-34. 
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(5) 


■ _ gii+i ni/» 

M + my - aO. 


and the logarithmic mean appearing when x = 1, j — » 0. 

In a chart between pages 28 and 29 Cisbani exhibits thirty varieties of these 
means (5). It will be noticed that z is indeterminate if a = 6. 

Some methods for dealing with means which may become indeterminate 
forms I have indicated in a recent paper.* 

Now a generalization from a mean of two variates to a mean of three or more 
variates may sometimes seem to be immediate. However, for the arithmetic 
mean {x + y)/2 of two variates x and the function [min. (x, y, z) + max. 
(x, 2/, 2)]/2 is as much a generalization as is the arithmetic mean (x + 2/ + z)/Z, 
Actually j the direction in which generalization is to take place is arbitrary. 
However, it is natural to expect the generalization to arise from a problem 
somewhat similar to one that may give rise to the original mean. And it is 
desirable that to the generalization should be carried over as many properties 
or characteristics of the original as is possible. 

In the foregoing illustration, we considered a single interval x ^ t S y in 
which incomes are distributed in accordance with a relative frequency propor- 
tional to 0(0- And the arithmetic mean of all these incomes was obtained as a 
logarithmic mean of the two range limits x and y, at least approximately, allow- 
ing integration to take the place of summation. If <l>{t) had been instead 
of A;r\ then the average of all the incomes would have been the geometric mean 
of the two range limits x and y. 

To effect a first generalization, we shall now suppose an original interval Xo to 
Xn , to be divided into n subintervals by points Xr such that 


(G) Xq ^ X\ ^ X2 ^ ^ Xn— 1 ^ Xfi . 

For each subinterval Xr_i to Xr the same function ^(0 will be used to describe 
relative frequency; but the total population for this subinterval will be con- 
trolled by a positive constant kr , in general different for the different subintervals. 
This may be described as stratification. To make this more concrete, let us 
suppose, as before, that 4>{t) = k/t. Then, with Xo > 0, the mean M, which 
will be described more in detail in the next section, will take the form 

/ijr\ Hr kri^Xr 

2]r kr log {Xr/Xr-l) ' 

Applied to incomes, M would, like m in (1), give average income. To get 
some idea of the significance of kr , let us imagine that in some community there 
are fr individuals in the income bracket Xr~i to Xr , say from SlOOl to $2000. 
Let us suppose now that fr other individuals with incomes between $1001 and 
$2000 distributed in exactly the same manner move into this same community. 


‘ *'The substitutive mean and certain subclasses of this general mean.** Annals of Math, 
8tai,, Vol. 11(1940), pp. 163-176. See p. 171. 
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Then kr would be changed to k'r = 2X;r . But, of course, among the entire 2fr 
individuals the relative distribution of incomes is exacUy the same as among the 
original /, individuals. 

In this interpretation A;, is a weight for a bracket of items. But, taking M in 

(7) just as it stands, k, is the weight for the amsecuHve pair of numbers Xr-i 
and Xr . 

2. The first generalization. When t is in some interval, I — (a, a'), finite or 
infinite, let ^(t) be a non-negative, integrable function of t. 

And in I let the points at which ^(t) = 0, if any, form a null-set. Then, with 
t in J, write 

(8) m = f‘<l>(t)dL 

•'a 

And, supposing that in (6), a < Xo , an < a', set 

(9) fr== f ' 4>(t) dt = <i>(Tr) - r = 1, 2, • • • , n. 

Then fr > 0; since ^(0 > 0 and is continuous almost everywhere in {xr~i , 
Xr). Since in any finite subinterval of I, Ut>(t) is integrable, we may set 

(10) = f ^(>(.t) dt= f i4t>(.t) dt. 

Ja Ja 

(11) Or = f iit) dt = 'i'(Xr) - 4'(Xr_l). 

Now, by a mean value theorem, there exists a number t'r such that 

(12) gr/fr = t'r , X,_i < <' < Ir • 

Taking positive numbers K , the weighted arithmetic mean of gr/fr , with 
weights krfr is then 

rn-) M = ^ S" krl'i'jXr) - 4^(x,-l)] 

krfr krlHXr) - ^(atr-l)] 

If 0(<) = k/t, this becomes the mean (7) associated with the logarithmic 
mean. Now, since for (13) the weights krfr are positive, it follows from (12) 
that 

(14) Xo < t[ ^ M ^ t'n < Xn. 

Suppose, now, that b lies in I, and that subject to (6) each ®, — » b. Then, 
by (14), M -*b. And thus M is an irUerndl mean of xo , Xi , • • • , x„ , although 
with the x’s all equal, M assumes an indeterminate form. 

In (13) the Air are applied to potrs of numbers, either to 'i'(x,) — 4'(x,_i) 

or to $(x,) — *(xr-i), whereas in most weighted means, the weights are applied 
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to indmdvtal ntmibers. We consider now a form equivalent to (13), but in 
which the weights tv are attached to the indirndual numbers. It seemed possible 
to get a more general mean than (13) by abandoning certmn conditioBB upon 
the weights c, which first arose. But such relaxing of restrictions leads to diffi- 
culties, as will be shown. By setting 


(15) Co — hi f c» — hn ^ 


<V “ fcr “ hr+l , 


we may write M in the form; 

(16) M 


CrMXr) 

23® ^ ^(^^r) 


r = 1, 2, • • • , n - 1, 


On the other hand, if we choose c’s subject to 


(17) Co < 0, Cr < — (co + Cl -t- • • • -I- (V-i) for 0 < r < n, 

(18) Cn = 23o Cr ; 


then positive k’s can be found to pass from (16) back to (13). 

The question arises whether if the conditions (17) are abandoned, and with 
the Cr not all zero, (18) is retained as 

(19) ^0 c, = 0; Some Cr ^ 0, 


M in (16) will continue to be a mean of a:o , xi , • • • , , possibly, an external 

mean. 

It may be noted that the condition X) ^r = 0 arises from the fact that when 
parentheses are removed from (13), each hr is matched by —k , . 

By an example, it will be shown that under (19) alone, M in (16) may fail 
to be a mean. In (8) and (10) take a = 0. Then with n — 2, 4>it) = t, take 
Co = 1, Cl = —2, Co = 1 in (16). Then 


( 20 ) 


Xq — 2xi 4- x\ 

2(®o - 2a:i + Xt) ‘ 


If 6 > 0, e = Xo — 5, = *1 — 5, and { = x* — 6, then 


( 21 ) 



* — 2)7 + { 


If now ri = 2«, and f = 3« + «*, then 

(22) M = 6 (2 -|- 6* H" **)/2 — > 5 1, as « — * 0. 


Since M does not approach b here, when Xo , xi , and Xt-*b,in the manner 
specified, M in (20) is nof a mean of Xo , Xi , and X|. 

We may enqiiire, further, whether the function M in (16) could be a mean if, 
discarding (13), (17) and (18), we put upon c, the single restriction c, > 0. In 
that case, if Xo < < < x„ , then, since 4(0 and 4(0 are continuous functions of 
t — see (8), (10) — ^it would follow that if each x, t, then M —* 4(0/4(0' ®ut 



426 


EDWARD h. DODD 


if ilf is to be a mean of Xo , a:i , • • • , iCn , then M —* t when each Xr — ► t. Thus 
we are led to 4'(0 = <$(<). Except possibly for points of a null set, #(<) and ^(0 
have derivatives and and thus 

(23) m = '*''(0 = + <^(0 = m ) + «•(<)• 


But then, since ^(<) = <^(0 — see (10) — it would follow that #(<) = 0 almost 
everywhere in I; but 4>(0 > 0, if < > a. Hence the assumption Cr > 0 is not 
sufficient to make the function in (16) a mean of xo , Xi , x„ . 

In’ the simple case of n = 1, M becomes 


(24) 


,, _ ’i'(xi) - SKaro) . 

<&(*7) '- W ’ 


and this is a symmetrical function of xo and Xi . 

The question arises whether if n > 1, ilf in (13) or (16) can be a symmetrical 
function of xa ,xi , •• • ,Xn- Assume, if possible, that with x < y < z, 


(25) 


fft-r « = Cfl'4'(a: ) + Ci4^(y) + d^j z) 

’ ’ Co4>(x) + Ci^{y) + C2$^) 


is a symmetrical function of x, y and z. Now if a/h = c/d, and 6 — d 0, it 
is well known that a/6 = (a — c)/(6 — d). 

Hence, if H{x, y, z) = H{z, y, x), and co Ca , then 


(26) 


H(x, y, z) 


(c o - c^) [4 f(a ;) - 4^(z)] 
(co — Ci) [<K(a:) — 4>(2)] ’ 


which is not symmetrical in the three variables. Then H is not symmetrical 
in X, y and z, unless, pos.sibly, when Co = C 2 . 

Likewise from H{x, y, z) = ll{x, z, y), we are led to the conclusion that H 
is not a symmetrical function of x, y, and z, unless possibly when Ci = Cj . But 
Co = Cl = c* substituted into (15) makes ki = kt =■ 0, which is contrary to 
hypothesis that kr > 0. Then in (25) the constants co , Ci and cj can not be 
chosen in conformity with (15) so as to make H{x, y, z) a symmetrical function 
of the three variables. 

Symmetry in tivo variables will appear, however, if the mean (13) reduces 
to a mean of just two variables as it does when each kr = k, constant, in which 
case. 


(27) 


M = 

^(x„) - 4>(®o) ■ 


Although in the generalization (13) symmetry is thus lost, another property, 
homogeneity is retained in what seem to be the most important cases. 

Most means Q(x, y, • • • ,w) in common use are homogeneous functions of their 
arguments. That is, if c is a constant, and a(x, y, • • • ,w) and Q(cx, cy, ••• , cw) 
are both defined when x, y, • • • , w lie in some interval J, then 
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(28) Q(cx, cy, • • • , cte») - cO(*, y, ,w). 

This homogeneity is associated geometrically with ruled surfaces, in particular 
with cones. 

With reference to (8) and (10), let us write 


(29) 


Fix, y) 


'*'(?) - ^(*) 


$(y) - #(x) ■ 

And now, let us consider a special variety of means obtained by taking in (8) 
(30) «(0 = 


where q is any real number. Then F{x, y) is a homogeneous mean; that is, 


(31) F{cx, cy) = cF{x, y). 

This is valid, indeed, even in the special cases, g = 0, — 1, and —2, which lead, 
respectively to the arithmetic mean, the logarithmic mean (1) and to a second 
variety of logarithmic mean 

(32) 

y — X 

exhibited by Cisbani. It may be noted that g = —3/2 leads to the geometric 
mean, and g == —3 to the harmonic mean of x and y. 

It is conceivable that for 4>(t) other functions than f — functions not equivalent 
to in integration — might be used to lead to a homogeneous F{Xy y) in (29). 
But such functions, if any, would hardly seem to be in common use. 

The M in (13) retains the property of homogeneity, at least for 4>{t) = 
and so will also the more general means exhibited in the next section. 


3. Further generalization. The means of Cisbani (5) suggest the following 
generalization. Let p be an integer or the reciprocal of an odd integer. With 
the notation of (13), take K > 0, and 

(33) F, = Zr G, = z? Kgf, 

(34) M, = [G,/F,f\ 

Indeed, if in (8) and (10), a ^ 0, then gr > 0; and we may take for p any real 
nOmber except zero. Now, Ml may be described as the weighted arithmetic 
mean of {gr/frY with 'positive, weights krff . And hence Mp is an internal mean 
of Xff , Xi , * * * ) Xn I that IS 

(35) ^0 ^ Mp ^ Xn • 

Furthermore, if in (8), ^(0 — i®, where q is any real number, then Mp is a 
homogeneous mean of a:o > , * " » . 
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Another generalization may be obtained by writing 

(36) tltr ffr//r t 

(37) - Mp = [Skrmfm"”. 

And Btill another 

(38) M" = [m{‘ -mS* • • • mM*'**'. 

These means (37) and (38) are internal; and they are homogeneous, if Fix, y) 
in (29) is homogeneous. 

The foregoing means are not, for n > 1, symmetrical functions of 
Xt,xt, • • • , *n . Now the mere abandonment of (6) may lead to functions like 
(20) which are not means at all. But symmetry may be introduced as follows. 
First, lay aside (6), but suppose that the Xr are all different. Then let 

(39) =? f ' «(0 dt, gr., = f * dt; 

Jxr 

where r = 0, 1, • • • , (w — 1); r < s ^ n. Then, let 

(40) U = Zf*r.., F = 2(7*..; 

where U and V is each a sum of n(n — l)/2 terms: Let W be the double- valued 
mean 

(41) W = ±[V/Uf\ 

Then PF is a symmetric function of Xo , , • • • , a;B • If, in (8), a' ^ 0, then 

in (12) each g,//, < 0; and in (41) the negative value of W is an internal mean. 
But the positive radical is external. On the other hand, if a ^ 0; then gr/fr > 0; 
and the positive radical in (41) is internal. In this case, it may be well to use 
for IF only the positive value of W. 

In the more general case where a < 0 and o' > 0, the fractions g,/fr may have 
different signs. But, in all cases, at least one of the two radicals (41) is an 
internal mean of xo, xi, • • • , x„ . Moreover, IF is homogeneous, if in (8), 

0 «) - 

Finally, let 

(42) Wlr., = gr,,/fr., , 

(43) Z = ±{[Lm\,.ynin - 1)}”*. 

Then Z is symmetric; and at least one value is internal. If o > 0, we would 
naturally take Z > 0; and this Z is then an internal mean. Moreover, Z is 
homogeneous if the nir,, are homogeneous; that is, if Fix, y) in (29) is homogene- 
ous for every x and y in 7. 



A STUDY OF R. A. FISHER’S x DISTRIBUTION AND THE RELATED 

F DISTRIBUTION* 


By Leo A. Aboian 
Hunter CoUege 


1. Nature of the problem. Consider two samples of Ni and Nt drawings, 
each sample drawn from one of two populations consisting of variates normally 
distributed with equal population variances a*. We define the two sample 

means , £2 == , XiS and x/s independent variates. We calculate 

Ni N2 

from the two samples 


2 (Xi - Xi)* s {Xi - £t)* 

g* — and si = , nx = iVi — 1, ?»* = W* 

ni fh 


The distribution of z 


2 

i log -■ is well known. 

52 


( 1 . 1 ) 


Piz) = 


2 n}’“ni"’ 




B 


2 ’ 2 / 


(nie**+n, 


dz. 


1 . 


We shall denote the ordinates by y{z). The purpose of this study is to discuss 
the seminvariants of the z distribution and also to find useful approximations 
for them; to show that as ni and n2 approach infinity in any manner whatever 
the distribution of z approaches normality; to find the upper bound of the ab- 
solute value of the difference between the distribution function of z and the 
function determined by the approximate seminvariants of the distribution of z 
for ni and large; to approximate the z distribution by the Type III distribu- 
tion, the Gram-Charlier Type A series, and the logarithmic frequency curve; 

and finally to investigate the same properties with respect to the F distribution, 
2 

where F = c** — • The non-existence of the moments of F for certain values 

at 

of til and ns is noted and explained on the basis of the distribution of the quo- 
tient 

X 


^ Presented to the American Mathematical Society, September 10, 1938, New York City 
in part; and to the Institute December 27, 1939 at Philadelphia. 
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2. General features of tiie r distribution. The t distribution is always uni> 
modal, asymmetrical if ni ^ nj , and symmetrical if ni » nt . We see that 
interchanging n\ and n% is the same as replacing r by — z. Fisher [7] noted that 
the two parameter family of curves includes as special cases the normal curve, 
the X* distribution, and Student’s distribution. The mode is at z = 0, the 
maximum ordinate is 


y(o) = 

p/ni nt\ 


or approximately 


(2.1) = + form and large. 

The two points of inflection are 

(2.2) 2 = J log + n 2 dz \/ n\ + n\ + 2 n\ n 2 + 2 n in\ + 2 ninX 

I n\n2 ) 

They are equidistant from the mode, a property also of the Pearson system of 

(Tviz) 

frequency curves [24], Also lim 2 ” J = 0. 

CtZ** 

3. The moment generating function and seminvariants. The moment gen- 
erating function of the z distribution is 


(3.1) M, 


■<«=&:)* 


'nt — 6 ni-\- 6 

<~T~’ 

„/ni njN 




The seminvariants of Thiele are defined by the following identity in 9'. 

(3.2) log M,(fl) = Xi9 "I* "I" ^ ^ ^ + ■ ’ ■ • 

To find Xr we take the logarithm of the moment generating function, expand it 

in powers of 0 and choose the coefficient of . A complete discussion of proper- 

rl 

ties of seminvariants may be found elsewhere [4]. 


4 . The seminvariants of z. Now by the following formulas [11] p. 38: 


(4.1) log r(l + x) 


—BlX , «jX* «s** BiX* 

1 2 3 4 


|x 1 < 1, 
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«»** , Si®* , 84®* 


(4.2) togr(i-®) = 84 * + ^+"-^ + ?^ + ..., |*|<i, 


'ndiere in both fonnulas 




*•= j^ + ^ + ^ + ^ + 


n ^ 2. 


log B(i[l + ®]i i) = log *■ ~ ffi* + 8» s ~ 5 j ~ ' ' 

(4.3) 2 3 4 


where 


1*1 < 1, 


- 1 _ 1 j. 1 1 j. 

1» 2» 3» 4» 


n k 1 


= »^2. 


Hence from (4.1) and (4.3) 


lo« r = J log . - i(.. + 5 ) + i („ + *) 

- j(.. + g)+|(o. + |)- ■••. 

Since <7* = * n ^ 2, we may write (4.4) as 

,4.5, .og r (L±i) = , log g -*(.. + ■^) + g (. - i) .. . 

From (3.1) 

log MM - log r (!l^*) + log r (^') 

+ 1 (log - log »i) - log r(j) - lo«i’(^)' 

The results assume slightly different forme for (A) ni and n» each even; (B) ni 
and nt each odd; (C) ni even, n* odd; (D) ni odd, nt even. The general formula 
for Xk, for all cases is 
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(4.7) Xr:, g| (ni + 2ifc)' (n, + 2fc)'r ^ 

This result is not so us^ul from the point of view of numerical applications as 
the formulas which follow. 


(5.1) 


6. Case A, ni and n« each even. From (4.6) 
log r » log + log (■ 


nt — 4 — fi 


’) 


+ 


+ iog(i-0 + iogr(i-0. 

Now log fl = ~]C r ( — “s') • There will be ^ — 1 series of 

\ nt — 2/ jb-i K \nj — 2/ 2 

this sort, and only one series of the type log r ('-D'Ss©*”*''''’"'”' 

(4.1). In the above expansion and those succeeding, terms not involving ff are 
omitted, since such terms are not needed in finding the seminvariants of z. The 

series log F ^1 ~ ^ will always occur. Then 

log r (^') - -f ! [(^2) + (^4) + ■ • ■ 


( 6 . 2 ) 


or 


+ 


— s* 



( 5 . 3 , ><«r(V)-S?(l)‘-§s'S'(l)‘ 

We remark that the double sum is zero if nt == 2. Similarly 


(6.4) 


or 




+ 


-\fc 


+ 




- «*ls 



By use of (5.3) and (5.5) we have for the seminvariants of z, when ni and nt 
are even 

(5.5, + ..3, 



nSHBR’S t DIBTBlBTmON 
For Xiu =» J we h^ve by (4.6), (4.3), and (4.6) 

(6.7) - i[(log«. - ‘g‘ 0 - (logn. - “g’ 0]. 

6. Case B, ni and nt odd. We have 


483 


( 6 . 1 ) 


r - log (=14^') + log (^1^0 


+ 


+ iog(L_«) + iogr(L_J). 

Expanding log T ( — by (4.5) 

® V 2 y/ Liw Hm - 2)* k{n2 - 4)* 


( 6 . 2 ) 


00 


a 


However s*^l = p + + which we shall denote 

hereafter by . Hence (6.2) becomes 

(e.3) log r (-^0 + !■) + £ ^ - £ I “■£ " (^0‘. 


Also 

(6.4) 

and 

(6.5) 


log r (“4-0 - log ("-t±f:-0 + log (^±40 + 


,.„g(<40,..,r(14). 




+ 


»* 


+ ••• + 


a 


2)* ' (ni-4)* 


( 6 . 6 ) 


, ogr ( 4 - 0.-.(.,+10 


A(^1)V fg (- 1)*-““‘^‘-*> tf* 

k a (21 + 1)*. 


Combining both these results (6.3) and (6.6) we have 
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Xr:. - (r- 1)1 




(2k + 


h) 


/ Km;-*) 1 U 

+ S SFTI)-)}- 

/l Km-«) , \ /, Km-I) 1 \ 

(6.8) X.„-J=.(5losn.- g _)-(-l08».- g ^j). 

7. Cases C, D, aad values at st, <rk,tk. The formulas for case C, ni even, 
rh odd are 

The results for case D, rii odd, nz even are 

(7.3) X.:.-fr-l)l{i(..-‘g'j,) + (-l)'(..-‘g‘’g4,^.)}. r£2. 


1 J(W1— 8) 1 1 <1 

(7.4) L * -1e‘. 

2 fl\ jbaaO 2/C "T" 1 2 k"»l A/ 


We list the numerical values of s* and tk, k ^ 10. The values of s* are from 
Stieltjes [20], 


(7.5) 8i = 0.57721 56649 
Si = 1.64493 40668 
s, = 1.20205 69032 
S4 = 1.08232 32337 
Ss = 1.03692 77551 

8| = 1.01734 30620 
87 = 1.00834 92774 
8t = 1.00407 73562 
8i = 1.00200 83928 
8io = 1.00099 45751 


(7.6) vi = log 2 = 0.69317 0206 
k = 1.23370 00550 
ti = 1.05179 97903 
U = 1.01467 80316 
U = 1.00452 37628 

U = 1.00144 70767 
<7 = 1.00047 15487 
U = 1.00015 51790 
t» = 1. 00005 13452 
<10 = 1.00001 70413 


By means of the formula <* — Sk i^-h) , k > 1, <t was calculated from st . 

From the well known results for the Zeta function of Riemann f (s), [22], (p. 265, 
P. 267), 

(7.7) f,-..-g‘= ' .il, k>l. 



(7.8) 
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(7.9) t.-f(,)(l-^). 

8. The mean of tibe 2 distribution. From our previous formulas for 2 we 
prove that if ni = nt , 2 *= 0, and 2 < 0 for w» > ni , 2 > 0 form > n» . The 
maximum absolute value of Xu, will occur when m = 1, ns * «, orm = «, m => 1, 

and from (7.4) or (6.8) we have max 1 Xu, | = ^ + i log 2 = .6352. 

2 


9. Formulas for Xs:, , /(t:< « Xu, , n »-., , Xu, , and m;, . We have four cases from 

(5.6) , (6.7), (7.1), (7.3): 

1 1 »("«-*) i\ 

= . 822467 -1( £ g y, ns,nseven. 

.2) X,,.. 2.467401 -j( g g „.,..odd. 

(9.3) Xs:, = 1 .644934 - j ( E + E 

(9.4) X.,. = 1.644934 g + g ^). n. odd, 9, even. 

In all cases of course Xs:, > 0 and moreover Xs:, — » 0 as ni and ns — ^ . We list 

1 /‘air‘ 1 ‘air‘ i \ 

(9.6> Xe.-^-(g g n„».even. 

1 /“Oir” 1 i<yr*) 1 \ 

(9.6) X.,.= j(g g ».,„,odd. 

1 /Hni-2) .| Kna-8) e \ 

(9.7) Xs.* 1.803085 + 1 ^ £ 

(F+lp), 

1 Ky-S) i\ 

(9.8) Xe.. - 1.808086 + ^ g - g ^), „.edd,n. 

(9.9) Xu, = .811742 - ^ ~+ ^ i^, m, ns even. 

( iCy-*) 8 l(y-«) 1 \ 

S S (wh?) 


even. 


m, ns odd. 
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(9.11) X 4 :. = 6.493939 - 6 { 2 Po u r i T i 2 m). ni even, nt odd. 

VS (2k + 1)^ KV 

/l«^2 1 1 \ 

(9.12) X 4 :.«= 6.493939 -6{ £ ^, + £ ^ J , ni odd, n* even. 

\ *,-1 fc* ib-0 \4fe + 1;V 

We see Kim > 0 whenever r is even. If r is odd K:m < 0 if na > ni , and Xr:# > 0 
if ni > 712 ‘ Also fjLriM > 0, Til > 712 , r odd, greater than one. Similarly firiM < 0, 
r odd > 1, 712 > ni . 

10. Skewness, excess, and values of an . We take for our measure of skew- 
ness as = ^2 = For Tia > 7ii , aa < 0. Further the skewness increases 
M2 Xa 

negatively if rii remains constant as Tia — » oo . Thus negative skewness will be a 
maximum for na = oo, m = 1, and positive skewness will be a maximum when 
772 — 1, 77i The absolute value of maximum as is 

12^8 

(10.1) I as I = = 1.5351. 

h 

M4 X4 

As our measure of kurtosis we use a4 = "”2 == 3 + rs. As a measure of excess, 

M2 X2 

Ey we use JS = a4 — 3 = 72 . The excess is always positive. 

X2 


11. Approziniations for Km by the Euler-Maclaurin sum formula. The exact 
results given previously for the seminvariants become unwieldy for n\ and wa 
large. Hence we develop useful approximations for the seminvariants, and give 
the maximum error of the approximation. We find first our rasults for Km 
when Til and 772 are even and r > 1. We begin with (5.6) 


K:m 


(r-1) 




and rewrite this as 


(11.1) 


X,:. 


(r- 1)1/ 

2r l4i. 




Now find the two sums of (11.1) by the Euler-Maclaurin sum formula [21] 
using the first three terms, and obtain 


KiM ' 


(r — 2) 1 r /nj -1- r — 1 


'K 


rTli ■+ r - 


nt 


— ) 


+ 


+ (- 1 )' 

ni 

r(r-l)/l _^(-l)'^ 

3 w* nr* / 

r(r- l)(r-H)(r-h2) / 1 (-D^NI 

46 Vni+* nI+*/J' 


( 11 . 2 ) 
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We use the following theomn [10] (p. 539), to find the error: 

If f{x) is of constant sign for x > 0, and together with all of its derivatives, 
tends monotonely to zero as x -♦ « , Euler's summation formula may be stated 
in the simplified form 

Z/. » f fix) dx + K/- + /o) + ll (/» - /i) 4- • • • 

»mO Jo JSl 


_J__ (—1)* /■ ,(»*+!) /(St+1)\ 

^ " i2k)\ ^ + ^ 

where 0 < 0< 1 and fij = 1/6, = 1/30, Bt = 1/42, B» = 1/30, Bw = 5/66, 

etc. If we use 


(11.3) X.,. - <r_- + (-!)> "■ 

^ \ n2 Ui / 


then the error committed is of the same sign and less than 

Hf_i_ (-1)') 

3!ln5+* nf‘ 


If we take 


Xr... = 


(11.4) 


' L\ fii ni / 

r(r — 1) / 1 

r“ w 


+ 


(- 1 ) 

nr* 


-)]■ 


then our error is less than, and has the same sign as 

(r + 2)!/ 1 , (-1)1 

“ 90 ur*"^ nr*/' 

Finally if we use (11.2), our error has the same sign as, and is less than 

(r + 4)! / 1 (-1)1 

945 ur* nr*/' 


12. Approximations for other values of ni and ni , r > 1. Now in case rii 
and ns are odd we have from (6.7) 

(12.1) X..= (r-1)!( E —^ +(-!)' ± To-rW,)- 

(2k + 1)' (2k + 1)'J 


Applying the Euler-Maclaurin sum formula to each of the sums in (12.1) we 
are led to exactly the same results given in paragraph (11). The other cases 
are obvious combinations of the sums in (11.1) and (12.1), and so for all values 
of ni and ns the approximate results for X^t , r > 1 are 
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( 12 . 2 ) 


fni+r^J ^ 

2 I nj Bi / 



Formulas (11.1) and (12.1) prove the result previously given for X,:. (4.7). 


13. The approxiiiiate values oi h., . From (5.7) 
x,,. - i [(los ' 1 ) - (log m - 1)] . 

We use the Euler-Maclaurin sum formula on the sum 


1 / 1 \ _ 2 
h k^\h Vfc + 1/ 


and the similar sum involved in Xi., . Hence we have 

(13.1) X,... = 1 (i - i) + - i) - ^ (i - , 

2 \n 2 Til/ 6 \nl n J 15 VnJ tv\I 


til and n* even. 


ni , % > 2. 


The errors committed by using one, two, or three terms of (13.1) are less than, 
and of the same sign respectively as 



15 \ni nU ’ 



For ni and both odd we find the same result as (13.1). The restriction ui , 
nz > 2, may easily be replaced by ni , n 2 ^ 2 (for ni , riz even) and ni , ^2 S 1 
(for Wi , nz both odd). When ni is odd, nz even, the formula is again the same 
as (13.1) if ni and nz are sufficiently large; but if ni and nz are small we find 
in this case 



Another method of finding (12.2) would have been to use the asymptotic ex- 
pression for log r(a;). 


14. Approximate values of Xr;. for values of r. We list the approximate 
values of Kit to three terms. 




vuuosb's * DunniBifnoM 


(14.1) 


X.. . * (Hi + H-‘) + + n) - U (-• + A) 

2 \ ni n* / 3 Vni n*/ 16 \n| nt/ 

X... - 1 - tiii?') + ("i - i) _ i ('i - i") 

aV »! «W^Vn! sUi n!/ 

- =5-‘) <■ “ (j - i) - “ (a - j) 


The approximate values given by CJomish and Fisher 18] (p. 319), are similar, 
but have fewer terms. Cornish and Fisher give no remainder term. From 

(14.1) and (12.2) we see the maximum absolute values of Xtr+i.-* , »* ^ 1, occur 
when Tit — «, Wi = 1, or ?is = 1, ni = ». Similarly X*r:. , r ^ 1, has its maxi- 
mum value for Hi = n* = 1. The standard seminvariants of z are defined 

(nt = \ , r ^ 2. We also note that for n* > ni , &,+!:, < 0, r ^ 1 and hence 
XJ 

a»r+i < 0 also where a„ = . Moreover the maximum absolute values of 

M2 

{sr:* and ferfi:* occur when ni=l,n 2 = ooorn 2 =l, ni~ co; and also for a 2 r 
and a 2 r+i . Approximately then 

(14.2) max = ( - 1)*" , r ^ 2. 


The .exact value for maximum a 4 :, is 3 H — = 

ft 


7.07. 


16. Approach to normality of the r distribution. We prove the theorem: The 
distribution of z approaches normality as rii and ?»* — ♦ oo in any manner what- 
ever, with 2 = — — ) , <ri = i f — -f i j . We also find an upper bound 

2\n, ni/ 2\th nj 

of the absolute value of the difiference between the z distribution and the func- 
tion determined by the approximate seminvariants of z when ni and n* become 
large. To prove the theorem we start with the original distribution of z, and 
find when ni and nj are large, 


(16.1) 


Piz) 



i(ni-Hii) 

e"“ (fe. 


We change to standard units * *= fcr, -1- f, then 
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(16.2) Pit) = 




ni + n, 

„ ju*-m I „ I ® 

.nie + n%) 


« < < < ». 


We rewrite this as 


(15.3) Pit) 


«i + Wj 


K»i+"i) 


gin, («,+»)/<„, +n,)^^^g-*n.(«,+l)/<«,+».)J 

Expand and «'+»)'<»>+"*> and add term by term. Divide 

this result by ni + nj from the numerator of P(t) to obtain 


(15.4) 


Hence 


(15.5) 


2ni n,(lff + i) 


1 4 - ^ I •'j ^ Q J 

(Wl + »2)* ' \(Wl + W 2 )*/ 


P(0 = 


1 + - 


2ni niita + 2 )' 


sSi'l-Kni+n,) 


ini + ntY 


We evaluate (15.5) for ni and n* large by using logarithms. 
«i + »s i-_ /i I 2ni mito + if 


log < 1 + 


(ni + thY 

Wi + Wj r/2wi«2(<o’ + 2)*\ 1 f2nins(<<r + 2 )* 


r2nin2i 

(«i 


2 \ (ni + 71,)“ 




in,iUr + if Y _ 
(ni + nj)’ / _ ' 


This gives 


*’'*<'** i _L o< s _i_ si^ /■< _i_ b\< -L ^ ^ i\>’ {2wins(/(7 + 2)“r 

- _ (1 . + 2te + * ) + (,, + 8 + g (- 1) . 


We reduce this then to 


‘ -I 5 , 

---a ir 


( 2 <r-^)^ if 2n\nl ) ««r + 2)^ 
2 "^2 \(ni + n?)*/ nj + n. 


+ terms involved in the above summation. Let U — a ’’l < <r. Since 

zV* lY 

lim a — 0, lim 11 = 0. Similarly lim — ^ = lim -» = 0. Con- 

ni,n2-»oo ni,n2“*«o ni,n2-^«o « ni,n2’**oe ^ 

n?n| ,, , ,s4 c~*it<T + S)* it+Uf „ ,. it+Uf 

(«! + ^ 2 )* 4(ni ■+■ 712 ) 4(71i + 71,) ni.n,-»oo 4(7li -j- 7^) 

0. In like fashion 

^ (-1)^ / 2711712 y ita + S)^' _ ^ i-iy<r-’‘'iUT + i)^ 

^ 2r Vi + ntj ini -|- 7»2)'-‘ ^ 2r(7ii + ti,)'^^ 

Now clearly from our previous discussion for r = 2, we see 
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lim S ^ 

i»ii»»-»«» r— * 2r (fii 4* 



Hiis completes the proof. 

We now consider the function, /(«), determined by the approximate semin- 
variants of z. We start with 


Xl:» 


l(i-i) 

2 Vnj ni/ 


and Xr;i = 


(r — 2)1 /n* + r — 1 


V 


712 


+ (-i) 


rTli + r — 1 

n[ 


r > 1, 


from (12.2) using only the first term. We may easily prove then that as tii 
and nj approach infinity in any manner whatever the function f(z) represents 
a normal frequency distribution with 


2 



and 



+ 1 
. n* 


+ ”- 4 J). 

Ul / 


This further shows the identity of f(z) and y{z) in the limit as nx and 712 
Since the moment generating function of f{z) is 


we have 
(15,6) f{z) 


fi - * Yi + 

\ ^2/ \ ni/ 






1 +.^) 

Tlx/ 


i(ni-l-Hfl) 


de. 


00 . 


I have not been able to evaluate (15.6). We instead shall find an upper bound 
to the difference | f{z) — y{z) | as ni and n* become large. We form/( 2 ) — y{z). 
Then by use of Stirling's formula for n! with the remainder term and by the 
Fourier Integral Theorem, 

(15.7) I /(«) - y{z) I g - i)y(t) ^here 0 < ft < 1, 0 < /J 4 < 1, 


and 


(15.8) lim |/(z) — y{z) | = 0, and for this casef{z) = y(z). 

ni,n2“»«o 

Of course (15.7) furnishes the upper bound of the absolute value between the 
frequency distribution of z and the function determined by the approximate 
seminvariants of « for any values of tix and 712 . 

Up to this point we have assumed that there exists a function determined by 
the seminvariants 


Xl:# 



and \r:z 


2 ( n* nl J 


This may readily be proved by using the following theorem [18] (p. 536) : The 
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determined character of the moments problem for an infinite interval k insured if 
2 diverges ^c« “ *" dF(x^ . 


16. The Pearson types of q>proiimating curve. In discussing the types of 
the Pearson system which may be expected to approximate the s distribution 
we shall use the results of H. C. Carver [1], and the further exposition of C. C. 


Craig [3]. To find the Pearson type we compute 5 = 


shall find it convenient to use the approximations a> = 


2a4 — 3ai — 6 
04 + 3 
\/2 (ni — n*) 


We 


and 


04 


3 + 4 to obtain 


(16.1) 


nin*(ni + th) 


S = 


(ni + nj)* 


Snlris 4- 3nini + 2ni — 2nin* + 2nj 


s> 


and consequently 0 < 8 ^ The only possibilities are Types IV, VII, VI, 
or V since the greatest value of by (14.1) is 2.3565. Now if ni = rij , we have 
Type VII, since oa = 0, 5 > 0. In all other cases we shall have Tyi>es IV, V, 
or VI according as a* < 45(5 + 2), o| = 45(5 + 2), aj > 45(5 + 2). We 
neglect 5*. Hence a* < 85 implies 

nj(ni — 2) + ns(15n? + 6ni) 4- ns(15nj — 8nJ) 

(16.2) 

4" ^^(wi 4" 6wi) — 2wi ^ 0. 

A simple investigation reveals then the following results: 

Type IV for ni, nj ^ 2,ni 9^ nj. 

Type IV forni = 1, 1 g nj ^ 21; or w* = 1, 1 ^ ni ^ 21. 

(16.3) Type VI for ni = 1, na > 22. 

for Hi = 1, ni > 22. 

Type VII for ni = n* . 

Clearly the z distribution has features comparable to Type IV since both have 
infinite range. However, Type IV is irksome to fit in practice. 


17. The Type HI approximating curve, the logarithmic curve, and tihe 
Gram-CharUer Type A. The criterion for Type III is 5 = 0, a* 0. We see 
that as ni and n* increase the value of 5 will decrease. Even for small values 
of Tit and nt Type III will furnish a fair approximation to the z distribution. 
For example ni = 10, n* = 5, 5 = .094. The advantage of the Type III approxi- 
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nuUwn rests on the fact that Salvosa's tables may be used. From thd chart in 
[16] once ai £ 2.3565, we are assured that the aiquroximatii^; Type III curve 
is bell shaped. For ni l, 2, n« = any value, this approximation is not all 
that could be desired, although even in such cases it does have velue. We note 

that Type III has limited range at one extreme , <»^ while the range of 

the s distribution is ( — «o, <«). Salvosa’s tables extend as far as oa « 1.1, 
and since max oi = 1.5361, we see in some cases, and these only for »i « 1, 
nt large, we shall be obliged to make use of Pearson’s Tables of the Incomplete 
Gamma Function [14]. The logarithmic frequency curve 


will be useful in approximating the z distribution. While it has been discussed 
by many authors we shall follow Pae-Tsi Yuan [23], where a full bibliography 
may be found. In our discussion we use the iSi = aj , ft ~ 04 chart of the 
Pearson system as given by S. J. Pretorius [16] (p. 147), since the logarithmic 
frequency locus connecting a\ and 0:4 is already drawn in. The justification of 
this curve for fitting is due to the fact that in the Pi , ft chart of the Pearson 
system as given by S. J. Pretorius [16] (p. 147), the logarithmic frequency locus 
lies in the Type VI region between the Type III locus and the Type V locus, 
and consequently closer to the Type IV region than Type III itself does. Hence 
since Type III fits fairly w^ell under certain conditions and Type IV fits well we 
can expect the same for the logarithmic curve. Furthermore when as is small 
the logarithmic curve is similar to Type III [23] (p. 42), and as az becomes 
larger, as = 1, the difference between the two types is pronounced. However, 
it is just when os becomes large in the region = 1, n® ^ 22 that we find the 
logarithmic curves give a fine fit, since in such cases the point (aj , ft) lies prac- 
tically on the logarithmic locus [16]. To fit the curve [23] (pp. 37, 48, 49), we 
find the values of the three parameters a, 6, c. To find c we solve the equation 
— (4 + <xz:b) = 0 for 1^; using the table [23] (p. 48) given by Pae-Tsi 
Yuan. Knowing w we can easily solve for 

c = (log w)\ 6 = 

(17.1) 

(«; + 2 )ff, -1 

where the value of x must be obtained from the table of areas under the normal 
curve, if the t distribution is approximated by use of areas. 

Since the Gram-Charlier T3rpe A series generally approximates a Pearson 
Type IV fairly well when a} is not too large, it is to be expected that the Type A 
series will approximate the z distribution in those cases when ni >= nt, and also 
when al is not too large. 
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18. Levels of significance and i^rozinuition mefiioda. We shall apply tihe 
results of the previous paragraphs to the determination of the value of e for 

any level of significance a, i.e. the value of 2 such that ^ p(z) d2 « 1 — a. 

We have such levels as the median (the 50% point of significance), the 20%, 
5%, 1%, and .1% points as given in [9]. Where these tables apply there is no 
need for other methods. It would be desirable to extend the results for any 
level of significance whatever. The methods which we shall use are (1) the 
logarithmic frequency curve, (2) the Gram-Charlier Type A, and (3) the Type III 
approximation. For finding the levels of significance by the Incomplete Beta 
function, the reader is referred to [13], (p. Iviii, topic (viii)). The logarithmic 
curve is very simple to use in conjunction with the table of areas under the 
normal curve. From Pae-Tsi Yuan we have 

(18.1) t = , where (e'* - 1)* 

(e - 1)‘ 

takes the same sign as as . The value of x is obtained from the table of the 
normal curve, 1,64 for the 5% level, 2.33 for the 1% level; the value of c is 
obtained from w (17.1), and consequently the value of t (18,1). Then we have 

g — . I 

if Za = value of 2 for any level of significance, t = — to solve for Zo , where 2, 

and are the values of the mean and .standard deviation of z as given by the 
proper formulas in (5), (6), (7). We illustrate with examples; 

(18.2) 5% point of 2, ni = w, n* = 1. aj = 1.5351, w = 1.2264, x = 1.64, 
t — 1.88, 1 = .6352, ff, = 1.11, and as aresultz6% = 2.72. Fisher [9] gives 2.7693. 

We can also find 28% easily for wi = l,n2= ». Hereaj = —1.5351, w = 1.2264, 
X = —1.64, t = 1.197, 2 = —.6352, a, = 1.11, Z8% = .694 compared with 
Fisher [9] 25% = .6729. 

(18.3) 1% point for ni = 4, n2 = 8, 2 = —.0701, a, — .4819, a*,, = —.3619, 
to = 1.0144, t = 2.17 and Zi% = .976, while the accurate result is .9734. 

From experience the values of z for any level of significance obtained by the loga- 
rithmic frequency curve will possess an error less than 2% of the true value of z 
for the level of significance if n\ and nj are greater than twenty. It would 
seem that for other values of ni and tij the error could not be greater than 10%, 
and usually would be much less. 


19. The Gram-Charlier Type A. We take the series in the form 

F(t) =* ip(i) + -H <p{f) = —7= 

V2t 





Some examples follow. 
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(19.1) We use the material of (18.3) and employ three tmns of Fit). 2 =« 
-.0701, <r, » .4819, X,:, = -.0405, X*. - .0336, At - .08032, At « .02696. 

Fitting Fit) by ordinates we have t = 2.17, and consequently z ■» .976. 

(19.2) We take ni = n* = 5 , 2 = 0, <r, = .4952, X*,, = 0, Xt. = .02798, At =»' 0, 
At = .01939. 

6 % point; By ordinates t = 1.57, 2 »% = .777, while Fisher gives .8097. 

1 % point: By ordinates t = 2.325, Zi% = 1.15, while Fisher gives 1.1974. 

(19.3) We take m = 3, n* = 20, 2 = -.15909, «r, = .5099, Xb:, = -.10222, 
X 4 ;. = .08822, At = .12854, At = .05438. By ordinates t = 1.523, 2 ,% = .618, 
Fisher gives .5654. t = 1.989, 21 % = .855, Fisher gives .7985. The Gram- 
Charlier Type A is recommended only for ni = n* and ni , nt ^ 20 . 


20. Type III approximation, the median, and 6 % point. Since for Type III 
the median, m, , is approximately two-thirds of the distance from the mode 
to the median if «» is moderate [ 12 ], [ 6 ], then we have further assuming ni , 

ni ^ 20. 


( 20 . 1 ) 



From experience this result will furnish an accuracy with an error less than 2% 
of the true value in the range above indicated. 


(20.2) tt% = 1.6437 + .2760a, - .04506o5 . 


This was found by use of Salvosa’s tables and for a, > 1.1 by [14]. 

(20.3) zt% = v.[1.644 + .27600,:, - .0451a|:,] + 2 . 


We illustrate the use of (20.3) with some examples. 

(20.4) Til = Ui — 1, = 1.5706, o,:, =» 0, 2 = 0, «,% = 2.582, 

while the accurate value is 2 ,% = 2.5421. 

(20.5) ni = «, n, = 1, o, = 1.5351, 2 = .6352, a, = 1.11, 2 ,% = 2.81. The 
accurate value is 2.7693. 


(20.6) ni = n* = 5, <r, = .4952, a,;, = 0, 2 = 0, 2 ,% = .8141, while the 
accurate value is 2 ,% = .8097. 

(20.7) ni = 4, n, = 8, 2 = -.0701, v, = .4819, a, = -.3619, Zt% = .6712, 
while the accurate value is .6725. 


(20.8) ni = 1, n, = 10, 2 = -.6835, a. = 1.1353, a, = -1.4333, 2 ,% = .7283, 
while the accurate value is .8012. 

In a future paper exactly the same methods will be used for any per cent point 
of 2 whatever in order to compare with the results of W. G. Cochran [2]. If 
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ni aad fit are large we may uee the i^proximate formulas for v, , at-, , and f 
to obtain to the order of <rl , 

(20.9) «.% - 1.644<r. + .7760 (i- - i), where a. «» t/UlZl). 

Vh ni/ y 2 \n» nj 

We expand Fisher’s result [9] 

1.6449 /l l\ 1 

Zt% = — 7 ' + .7843 ( ) by the binomial theorem, where h « “i , to 

"V w — 1 Xfh ^ 1 / ^ $ 

obtain a comparable result 

(20.10) 26 % = 1.646(r, + .7843 (i - i 

\na ni 

The numerical examples given in this chapter illustrate unfavorable cases as 
well as favorable ones. 


21. The distribution of F. Historically Snedecor [19] was the first to use F 
for We find 


( 21 . 1 ) 


P(F) 


nhn|"» 


0 g F g 00, 


The distribution of F is / shaped if ni ^ 2 , and bell shaped for ui > 2 , and for 

ni > 2 one mode exists, Fo = — ^ — The two points of inflection, which 

^ 1(^2 + 2} 

exist for ni ^ 4, are equidistant from the mode. The moments are 




> p ^ nt - 2ot ^ 


©Kt) 


ut > 2m 


ria 

»^^2’ 


at.r ‘ 


nj > 2, 

2 \/ 2 ( 2 ni + nt) 
'\/nini(ni + ni) 


M2 == 


2ws(wi "h Wj — 


ni(n» - 2)*(»2 


-4) ‘‘Vm V’ 


The exact results for fi», nt, at, and 04 are omitted because of length. We 
have the theorem that as ni , nj — » 00 in any manner whatever the distribution 


of F approaches normality with mean F = 1 , o-, = 



The proof 


is omitted. The only type of approximating curve of any value is T 3 T)e III. 
Of course the distribution of F is Type VI. No tables exist for Type VI. 
Furthermore the F distribution approaches the Type III function so slowly as 
to make most approximations of little value unless ocf.r ^ 1.1. Other posrable 
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parameters are « = n»S + ij ^ I «»:» I “ 

2 I at:t I approximately we see that the distributioa of H is more skewed than 
that of *. We mention briefly also /Sj — /S| where /S? = ^ ^ ^ 4 • 

Ni Nf 

Clearly z, F, d, and H give equivalent levels of significance. This is not true 
for z and 5* — Sl . 

a 

Finally, since F = ^ , it may be interpreted as a quotient [6]. When the 

S2 

moments of F do not exist, it is due to the distribution function of al . 

22. Conclusion. We have found the seminvariants for the z distribution, and 
approximations for them. Type III, and the logarithmic normal frequency 
functions are shown to be excellent approximations to the z distribution. The 
approach to normality for the z distribution is proved. A formula is given for 
finding the 5% level of significance for z. The F distribution is studied along 
the same lines. As far as the construction of tables for levels of significance is 
concerned, the z distribution is much easier to use. My sincerest thanks are 
due Professor C. C. Craig for his helpful guidance and many suggestions. 

BIBLIOGRAPHY 

[1] H. C. Carver, Handbook of Mathematical StatiaiicSt H. L. Rietz, ed., Boston: 

Houghton-Mifflin Co., 1924. Chapter on frequency curves. 

[2] W. G. Cochran, ‘^Note on an approximate formula for the significance levels of z” 

Annals of Math. Stat.^ Vol. 11 (1940), pp. 93-^5. 

[3] C. C. Craio, new exposition and chart for the Pearson system of frequency curves,*' 

Annals of Math. Stat.f Vol. 7 (1936), pp. 16-28. 

[4] C. C. Craig, **An application of Thiele^s semi -invariants to the sampling problem,” 

Metron, Vol. 7 (1928-29), pp. 3-74. 

[6] C. C. Craig, ”The frequency function of y/xf* Annals of Math. ^ Second Series, Vol. 
30 (1929), pp. 471-486. 

[6] A. T. Doodson, “Relation of a mode, median, and mean in a frequency curve,” Rtomst- 

rika, Vol. 11 (1917), p. 425. 

[7] R. A. Fisher, “On a distribution yielding the error functions of several well known 

statistics,” Proc. International Math. Cong., 1924, Toronto, Vol. 2, pp. 805-813. 

[8] R. A. Fisher, and E. A. Cornish, “Moments and cumulants in the specification of 

distributions,” Revue de VInstitut International de Statistics, 6th year, pp. 307-20, 
1937, La Hague. 

[9] R. A. Fisher, and Yates, Statistical Tables for Biological, Agricultural, and Medical 

Research, London: Oliver and Boyd, 1938. 

[10] K. Knopp, Theory and Application of Infinite Series, English translation, Edinburgh: 

Blackie and Son, 1928. 

[11] N. Nielsen, Handhuch der Theorie der Gamma Functionen, Leipzig: Teubner, 1906. 

[12] C. A. Olshen, “Transformation of the Pearson Type III distribution,” Annals of 

Math. Stat., Vol. 9 (1938), pp, 176-200. 

[18] K. Pearson (Editor), Tables of the Incomplete Beta Function, London: Biometrika 
Office, University College, London, 1934. 

[14] K. Pearson (Editor)^ Tables of the Incomplete Gamma Function, London : His Majesty’s 
Stationery Ofifiee,* 1922. 



448 


liOO A. ABOIAN 


[15] K. PxABsoN, 8. A. Stoufteb, and F. N. David, “Further applications in statistios of 

the 7’iii(z) Bessel function,” Bimetrika, Vol. 24 (1982), pp. 293-360. 

[16] 8. J. Pbetoriub, “8kew bivariate frequency curves examined in the light of numerical 

illustrations,” Biometrika, Vol. 22, (1930-31). 

[17] L. R. Salvosa, “Tables of Pearson’s Type III function,” Annd$ of Math. Stat., Vol. 

1 (1930), pp. 191-8 et seq. 

[18] J. 8bohat, and M. Fbecbxt, “A proof of the generalised second limit-theorem in the 

theory of probability,” Traru. Am. Math. Soe., Vol. 33 (1931), pp. 631-43. 

[19] G. W. Snxdeicos, Calculation and Interpretation of the Analytie of Variance and Co- 

variance, Ames, Iowa; Collegiate Press. 

« 

[20] T. J. Stieltjes, ‘Tables des valeurs des sommes “ J} Acta Math.^ Vol. 10, 

pp. 299-302, 

[21] Whittaker and Robinson, The Calculus of Ohservations, Edinburgh: Blackie and Son, 

second edition, p. 135. 

[22] Whittaker and Watson, Modern Analysis^ 4th edition, London; Cambridge Uni- 

versity Press, 1935. 

[23] Pab-Tsi Yuan, “On the logarithmic frequency distribution and the semi-logarithmic 

correlation surface,” Annals of Math, Stat,^ Vol. 4, (1933). 

[24] R. T. ZoCH, “Some interesting features of frequency curves,” Annals of Math. Stat,^ 

Vol. 4, (1935), pp. 1-10. 



THE DOOLITTLE TECHNIQUE 

By Paul S. Dwyer 
University of Michigan 

1. Introduction* Most authors who have presented the Doolittle method, 
from Doolittle [1] down to the present, have not given any formal proof that the 
solution is valid in the general case. They usually are content with a form 
describing the various steps of a Doolittle solution. 

The author has recently shown [2] that the Doolittle method can be abbrevi- 
ated to a technique which is also an abbreviation, essentially, of the method of 
single division and its abbreviation wliich Aitken called the “Method of Pivotal 
Condensation’^ [3]. It appears at once that the validity of the Doolittle method 
follows from the validity of the method of single division — a validity which is 
readily established. 

However one may desire a “proof” which is based directly on the Doolittle 
technique without referring to other methods of solution. It is the chief 
purpose of this paper to present such a proof. It is accomplished by the intro- 
duction of a notation which precisely describes the conventional Doolittle 
process and by proving that this process results in a system of equations whose 
prediagonal terms are zero. It is a secondary purpose of the paper to emphasize 
the advantages of the Abbreviated Doolittle method and to explain and illus- 
trate minor variations in the conventional Doolittle technique. 

2. The Abbreviated Doolittle solution* We first dir^t our attention to the 
essential parts of a Doolittle solution and these are the last two rows of each 
matrix of the standard Doolittle presentation. The additional row's in the 
standard presentation are rows of products which are used solely for the purpose 
of finding the two bottom rows of each matrix and they need not be recorded, 
if a computing machine is available, since the essential information is present 
in the tw'o bottom rows. Doolittle [1] did not have calculating machines (he 
used multiplication tables) but he put the important information in Table A 
and carefully segregated the supplementary information in Table B. With 
reference to this he wrote [1] 

“It is to be observed that the numbers in Table B have but a single use while 
those in Table A are used over and over, and where the number of equations is 
large, it is of great advantage that they should be thus tabulated by themselves 
in a form compact and easy of reference.” 

For purposes of proof, as well as for purposes of calculation if a computing 
machine is available, it is only necessary to utilize the forward part of the 
Abbreviated Doolittle solution which is the equivalent of the Doolittle Table A. 

449 
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A four variable illustratioo of the Abbreviated Doolittle technique is presented 
in Table I. The successive equations are indicated by number, as is customary, 
and the operation which defines the equation is specified. The actual operation 
is indicated more explicitly by the notation of column 3 and this is discussed in 
the next section. 

The presentation of Table I introduces one variation from the standard Doo- 
little method. The division is made by the diagonal coefficient of each row 
rather than by its negative. One may still use the old technique, if he prefers, 
but it is felt that one can subtract products as easily as he can add products with 
modern machines equipped with automatic negative multiplication. In addi- 
tion the entries of the equivalent rows then have the same signs and, too, it is 
not necessary to take the time to change the signs of the second rows. This 
variation uses the same division method as the method of single division [2] 
and as the method of pivotal condensation [3] so that the abbreviated form of 
these methods is, essentially, the same as the abbreviated form of the Doolittle 
method. 

The application of this technique leads at each step to a coefficient for each 
variable. However if the process is to lead from our four equations in four 
unknowns, to three in three, to two in two, to one in one, it follows that all the 
entries to the left of the diagonal, which we may call prediagonal entries, must 
be zero. That this is true in the general case is the objective of the proofs of 
later sections. 

3. A notation for and description of the Doolittle technique. A main contri- 
.bution of the present article is the use of a notation which describes the Doolittle 
technique. As long as the Doolittle process is described loosely by means of 
“operations” it is difficult to be precise in defining quantities which appear in 
the calculation, but when a notation is used which is definite enough to permit 
expansion in terms of the original coefficients, some sort of proof may be avail- 
able. The present notation bears some resemblance to that suggested by 
Gauss [4], though Gauss used letters to indicate the primary subscripts and 
numbers to indicate the number of secondary sub.script8 and his notation was 
directly applicable to the sums of least squares theory rather than to symmetric 
equations in general. 

We wish to find the solution of the equations 

n 

(1) S onXi = o„+i,#, i =» 1, 2, • • • , n 

where the matrix of the coefficients is symmetric. We do this by obtaining 
auxiliary equations which feature a decreasing number of variables. No serious 
restriction is made if we assume that the variables xi,xt,x», etc., are eliminated 
successively. The Doolittle technique may then be described as follows; 

We take the first equation of (1) and divide by its leading coefficient, ou , to get 
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Os tt 

(2) ^ btiXi - 6»+i,i, where ixi * — 
and we then form 

( 3 ) 2 ~ <^n+i,a.i with a<2.i * ci<i — a<l62l• 

We then divide by a^.i and get 

( 4 ) 2 = fcn+U l with 6<21 = • 

M 022*1 

We next form 


( 6 ) 0<8.12 3 f% ~ On+I.j.ia with (Z«8*12 = — Oii&si ““ 0|2.1 632*1 > 

t -1 

and 

(6) ha.uXi = 6n+i,8*i2 with tis ia ~ 

t-l 088*12 

This process is continued so that, in general, we have 


n 


(7) 

23 a,7.i2...y_lX< ~ fltn+l.}.12** *}*-!, J = 1, 2, • • • , n 

t‘-l 

and 


(8) 

n 

21 6,7.i2.../-.iXi = 6n+u.i2* * .y-i , i = 1, 2, • • • , n 

<»1 

with 


(9) 

..y-i = a,'y — aubji — at2.i6,*2.i — Ua .126/8 12 — • • • 

Cl<,,- 2 . 12 .../- 86 /,/- 2 . 12*../~3 “* flt,/-1.12*../~26/,/~l.l2 

and 


(10) 

, fl*7*12.../~l 

0</.12*.*/-l == 

a//*i2.../-.i 


It is to be noted that the n equations ( 1 ) are transformed by this process to 
the n auxiliary equations of ( 7 ) or ( 8 ). The solutions of ( 1 ) are also solutions 
of these auxiliary equations since the auxiliary equations are linear combinations 
of ( 1 ). It is our purpose to show that the prediagonal coefficients of these 
auxiliary equations are always 0 so that these auxiliary equations feature a 
decreasing number of variables. 

We may use the term primary subscripts to indicate the first two subscripts 
and the term secondary subscripts to indicate the later subscripts which specify 
the order of elimination of the variables. The “order” of the coefficient is then 
equal to the number of secondary subscripts. 
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The formula (0) gives the matrix of the final Doolittle set of equations. At 
each stage of the reduction oqe can write down a formula lor all the elem^ts 
in the matrix at that stage. Thus one can write the co^cients erf order h, 
; in terms of coefficients of order less than A, 

=** dii — fltibyi — ayj.jt&yt.i — ••• 

It follows at once that 


( 12 ) 






dKk‘\%- • 


4. Some theorems on the interchangeability of subscripts. Our main objec- 
tive is to prove that the prediagonal terms are zero. In order to do this we first 
prove some theorems dealing with the primary and secondary subscripts. 

Theorem 1: The valm of a » is not changed if the primary subscript are 
interchanged. This theorem which might be stated ‘The matrix of the coeffi- 
cients of a given order is symmetric'^ follows from the symmetry of the matrix 
of coefficients of zero order. We can show that the symmetry of the matrix 
having cx>efficient of order h follows at once from the symmetry of the matrix 
having coefficients of order ft — 1 by comparing the value o<y, . . .h with that of 
aji...,h obtained by dual substitution in (12). Since the matrix of zero order 
coefficients is symmetric by hypothesis, it follows that the matrices of the 
coefficients of order 1, 2, 3, 4, etc., are in turn symmetric. 

Theorem 2: Any pair of consecutive secondary subscripts may be interchanged 
without changing the value of the coefficient. This theorem indicates that, within 
prescribed limits, the order of elimination does not have any effect on the result. 

Consider the coefficient having r secondary subscripts before the 

k and s secondary subscripts after the I and consider the corresponding coeffi- 
cient a,y. ...Ik... which results from an interchange of k and I. These coefficients 
can be expressed by continued use of (12) in terms of coefficients of order r -f- 2. 
The resulting expansion of a, is equivalent to that of with the 

interch^ge of the I and the k. It follows that the theorem is true if a,,. . . .^ « 
di,. ...hi , Now a double application of (12) to oy,. . . .«* leads to the expansion in 
terms of coefficients of order r (using the notation ayy. to indicate the coefficient 
of the r-th order) 


(13) 


dii,...ki 


dii. — 


dik. djk* 
dkk» 



dkki 




diu — 


d]k» 

dkk* 


djk* 

dkk. ) 


Then a«y. . . is expanded similarly, the difference is formed and found to be zero. 
It follows that the theory is true. 
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The application of llieorem 2 Tvitfa the continued interchange of sucoeaaive 
secondary subscripts in all possible ways leads at once to 

T^sobbh 3: The secondarp eubecripte mop be interchanged in all poeeible ways 
without changing the value of the coefficient. This theorem might be stated “The 
value of the resulting coefficient is independent of the order of elimination.” 
This is the sort of result one would expect to find and indeed, some may feel that 
it is intuitively evident, but this fonnal proof is presented for those who desire 
a more rigorous approach. 

Theorem 3 enables us to prove Theorem 4 which may be stated: The value of 
u, 7 .M...n is alumps zero if at least one of the secondarp subscripts is equal to one of 
the primary subscripts. 

Suppose i is this subscript. Then by Theorem 3, i may be placed in the final 
position. Now by (12) we have 


aij....i 


Oii.... 


aij....au.... 

diie . . . 


0 . 


A similar statement holds if j appears among the secondary subscripts. 


6. The vanishing at ffie prediagonal entries. As an application of Theorem 4 
we can show that the prediagonal entries are identically zero and this is exactly 
what is needed to establish the validity of the forward Doolittle process. It is 
to be noted that the prediagonal entries are of form . . .,_i with i < j. Then 
i must equal one of the secondary subscripts and the term is zero. 

It fpllo\ra that no entries need be made to the left of the diagonal in the 
Abbreviated Doolittle solution and, indeed, no entries need be made in the 
original matrix below the main diagonal. A numerical problem is presented in 
the next section. 


6. niustratioii. The Abbreviated Doolittle technique is illustrated in Table 
II. This illustration is essentially an illustration of a previous article [2] and 
serves as the basis, in a later section, for expansion into the standard Doolittle 
solution. The check is shown in the right hand column and the back solution 
is indicated. The check entries for the first matrix are obtained by adding the 
entries in the row to the main diagonal and th^n adding the entries in the 
column. All other check entries are obtained by adding the entries in the row. 

The solution is easily made once it is understood and results from continued 
application of formula (9). For example 

Ui4i*» = 0*4 — o«i5« — Ou.ibn-i — 

and this is 

Om.us « .8000 - (.2000) (.6000) - (.3200)(.1905) - (.4619) (-. 1612) = .6936 

(see the underscored entries of Table II). Terms of this sort are easily com- 
puted if a calculating machine, and especially so if one equipped with automatic 
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positive and negative multiplioatioB, is avaUaUe. The back sehition too is 
easily accomplished with a machine. It is only necessary to substitute in turn 

in each of the “b" equations. Thus the value of xi is =■ bu-m , the Value 

du-rn 

of ^ is &IU.12 ~ &U>1S4 > that of X3 is ftsa.i &4Sab|4*lS8 * 

bn-iM ) etc. The back solution of the check is treated sunilarl^. 

7 . A variation in technique. Before proceeding with the presentation of a 
standard Doolittle solution it seems wise to indicate another possible variation 
in the technique in addition to the division by the diagonal coefficient rather 
than its negative. It is possible to obtain the Doolittle solution by using the 
fixed entry from the first of the equivalent rows in place of using the fixed “6” 
entry and the variable “o”. This results from the fact that 

( 14 ) 6,*.... —aik-‘bit....{’= ^ . 

\ o**.... / 

Thus in Table II the value au.m can be obtained with the use of 

064 -m = 064 ~ fl4l561 ~ 0*l lbta.l — 04S.U&(3.U 

as readily as with the use of 

064 -m = 0*4 — a6i54i ~ Ota-ibo.i — ata-iAo-ii’ 

See the boxed entries of Table II. 

There seems to be no real choice between these techniques. The fixed “6” 
is traditional in the standard Doolittle solution while the abbreviation of the 
method of single division leads to a fixed “a”. The point to be emphasized here 
is that either the fixed “o” or the fixed “6” can be used. Also ( 14 ) is used in 
the next section in supplying details for the check portion of a standard Doo- 
little method. 

8. The standard Doolittle method. If no computing machine is available 
or if a more detailed solution is desired, it is preferable to record the individual 
products of (9) and thus arrive at the standard Doolittle method. (The division 
by the diagonal coefficient rather than its negative is not a fundamental differ- 
ence.) The standard Doolittle method, from this point of view, is an expanded 
form of the Abbreviated Doolittle method with more details added. Its validity 
then follows from the validity of the Abbreviated Doolittle method. While it 
is not true that all prediagonal terras vanish in the standard Doolittle method, 
and this fact complicates the check by row sums, yet the prediagonal a,/.... 
(and bii , . . .) are all zero. 

The standard Doolittle method is presented in Table III. Some remarks 
should be made about the non-recorded terms, the two check solutions, and the 
back solution. 

The blanks ( — ) indicate non zero entries which are usually not presented in a 
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Doolittle solution. They should be considered however if the first check method 
is to be used. 

The first check method, which is the logical extension of the check method of 
the Abbreviated Doolittle solution, has been outlined by Ezekial [5]. The row 
sum is the sum of all the entries in the row whether recorded or not. In order 
to check, it is necessary to add these unrecorded entries, and they are available 


TABLE II 

Abbreviated DoolitUe SoMion; iUustration 


Xi 

Xi 

Xi 

Xi 


Check 

1.0000 

.4000 

.5000 

.6000 

.2000 

2.7000 

— 

1.0000 

.3000 

.4000 

.4000 

2.5000 

— 

— 

1.0000 

.2000 

.6000 

2.6000 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.0000 

.40000 

.6000 

.6000 

.2000 

2.7000 

1.0000 

.40000 

.5000 

.6000 

.2000 

2.7000 


.8400 

.1000 

.1600 

.3200 

1.4200 


1.0000 

.1190 

.1905 

.3810 

1.6905 



.7381 

-.1190 

.4619 

1.0810 



1.0000 

-.1612 

.6258 

1.4646 




.5903 

.6935 

1.2837 




1.0000 

1.1748 

2.1747 



1.0000 


.8152 

1.8152 


1.0000 



.0602 

1.0602 

1.0000 




-.9366 

.0635 


in the columns above if we make use of formula (12). Thus, if we wish to check 
the value a,i6« = 1.6200, we have 

Oiiho + + Onbii + 041641 + 051641 = 

<*41 + 0416*1 + 04i6ji + O41641 + O61641 = 

.6000 + .2400 + .3000 + .3600 + .1200 = 1.6200. 

Another check method, which is recommended by Peters and Van Voorhis [6] 
sums the entries in the row only over those columns which are to be recorded. 
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This is presented jis check method 2 of Table III. As is to be «g}eeted, tiie check 
values of the a’s and b’s of the last two rows of each matrix are in agreement. 

It might be noted that one may use the first check method without checking 
the intermediate steps (the sums for each row) if he checks the sums for the last 
two rows of each matrix. 


TABLE III 


Doolittle eolulion, with checks 


Notation 


xs 


u 


Check 
Method 1 

Cheek 
Metl^ 2 

an 

1.0000 

.4000 

.5000 

.6000 

.2000 

2.7000 

2.7000 

a«2 

— 

1.0000 

.3000 

.4000 

.4000 

2.5000 

2.1000 

an 

— - 


1.0000 

.2000 

.6000 

2.6000 

1.8000 

a.4 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.8000 

Oil 

1.0000 

.4000 

.6000 

.6000 

.2000 

2.7000 

2.7000 

bn 

1.0000 

.4000 

.6000 

.6000 

.2000 

2.7000 

2.7000 

an 

— 

1.0000 

.3000 

.4000 

.4000 

2.6000 

2.1000 

0*1691 

— 

.1600 

.2000 

.2400 

.0800 

1.0800 

.6800 

Oi2‘ 1 


.8400 

.1000 

.1600 

.3200 

1.4200 

1.4200 

hvx 


1.0000 

.1190 

.1906 

.3810 

1.6906 

1.6906 

an 

— 

— 

1.0000 

.2000 

.6000 

2.6000 

1.8000 

Oti6ai 

— 

— 

.2600 

.3000 

.1000 

1.3600 

.6600 

0*2-i682*I 


““ 

.0119 

.0190 

.0381 

.1690 

.0690 

0*8-12 


j 

,7381 

--.1190 

.4619 

1.0810 

1.0810 

6*8-12 


1 

1.0000 

-.1612 

.6268 

1.4646 

1.4646 

0*4 

— 

— 

— 

1.0000 

.8000 

3.0000 

1.8000 

0*1641 


— 

— 

.3600 

.1200 

1.6200 

.4800 

a*2> 1642-1 


— 


.0306 

.0610 

.2706 

.0914 

0*8-12648*12 



— 

.0192 

- .0746 

-.1743 

-.0553 

Of 4- 128 




.6903 

. 693 ^ 

1.2838 

1.2839 

6*4-121 




1.0000 

1 1.1748 

2.1748 


6*8-124 



1.0000 

I -.1894 

.8162 

1.81532 

-.3606 

6*2-184 


1.0000 

1 ,0970 

.2238 

.0602 

1.0602 

.4143 

6*1-284 

l.OOOC^ 

1 .0241 

.4076 

.7049 

-.9366 

.0634 

1.3049 


. 2160 j 

.90761 .4241 


The back solution is carried out as in Table II. If no computing machine is 
available or if the detailed steps are desired they may be indicated as in Table 
III. The entries in the box under the x* column are respectively bu-aJba-a , 
5m.ui54i.i, and Those in the preceding column are b».iaJ)n-i and 

bu.tubai . The other entry is bu-uibti . The values of the coefficients are ob- 
tained by subtracting these row entries from the constant term of the corre- 
sponding “b” equation. Thus, ba.m “ (.6268) — (—.1894); bnvu = 
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(.3810) -r -0970 -- .2238, etc. The back Bolution of check method 1 agrees 
with that of check method 2. A form for accomplisUng the back solution of 
the check is indicated at the right. It is not necessary to complete the back 
solution of the check if it is not desired, and indeed, there are some who feel 
that the use of the row sum check is unnecessary with modern computing ma- 
chines [7]. The basic check is substitution in the original equations. 

9. Sunmiaiy. The chief purpose of this paper is to show that the Doolittle 
technique actually leads to a set of equations featuring a decreasing number of 
unknowns. This is accomplished by the introduction of an appropriate notation 
to describe the process and the establishment of certain theorems which serve 
to validate the process. These theorems are of some interest aside from the 
application made here. It is a secondary purpose of this paper to emphasize 
the practicability and theoretical advantages (relative ease of calculating, theo- 
retically more accurate, less chance for numerical error, less recording, less time 
consuming, more compact, and more easily checked) of the Abbreviated Doo- 
little method and to explain and illustrate possible variations in technique in the 
forward and check (by row sums) portions of the standard Doolittle solution. 
It should be noted that the notation suggested is very useful in providing an 
easy development of various theorems used in multiple and partial correlation 
studies, the presentation of which is not the purpose of the present paper. 
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NOTES 

This section is devoted to bri^ research and expository articles^ notes on methodology 
and other short items. 


A PROBLEM IN ESTIMATION 

By Joseph F. Daly 
The Catholic University of America 

Several recent psychological studies in the field of memory testing [1], [2], [3] 
have suggested the following problem. Let each individual E in our popula- 
tion be characterized by the variates • • • ,y^; • • • , y^* (p > t). Sup- 

pose, however, that circumstances make it impossible for us to observe the last 
t variates. For example, we may think of • • • , as an individual’s scores 
on a battery of tests, and think of y'^^y • • • , as measures of certain psycho- 
logical characteristics which, though affecting the individual’s performance, are 
not subject to direct observation. To make up for this, assume that we have 
a theory which tells us that if y^^^y • • • , y^'^* are held constant, then the ob- 
servable y^s are dependent upon them according to a specified regression equation 

y' = (t = 1, * • • , p; M = p + 1, • • • , p + 0- 

Somewhat more precisely, we assume the distribution laws 

(1) fiy\ • • • , y’^‘) = I Ar. I* exp {-iAr.iy\ - a'W - a*)j, 

(where r, 5 = 1 , • • • , p -h and repeated indices are to be summed according 
to the usual convention) and 

(2) f{y\---,yn y”^' , • • • , y^ = (2xa*)-“’ exp -I E (y‘ - • 

The xl are supposed to be known, but except for the conditions imposed by (1) 
and (2) nothing is known about the quantities An , a’^, and <t*. Having observed 
the test scores yl, (a <= 1, • • • , Af) obtained by N individuals Ea drawn at 
random from the population, we wish to estimate the values • • • , 
corresponding to each Ea , and the essential parameters in the distribution law 
(1), particularly the variances and covariances of y’^\ • • • , y”**. 

We can easily find optimum estimates of the yi by appl 3 dng the method of 
maximum likelihood to the function (2) after sul^tituting for the y* the scores 
y*a obtained by the individual in question. Thus if we write 

459 
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(assuming thereby that the rank of the matrix || || is 0 we have 

(3) yi = rxW,. 

These estimates are unbiased in the sense that the expected value of ^ calculated 
from the distribution law (2) is y". 

But when we come to estimate the variances and covariances involved in (1), 
the procedure is less straightforward. Under the present circumstances we 
cannot use the expression 

(4) Z (/. - fWu - yl, 

for the sample covariance of y" and y”. We might, of course, try substituting 
the estimates from (3) for the unknown yS in (4). But this expedient will 
in general produce a biased estimate. Denoting the required covariance by 
^4"' (the element in the appropriate po.sition in the inverse of the matrix || .4,, ||), 
we find as a matter of fact that the expected value of (4) when the yi are re- 
placed by their estimates y^ is 

(5) 4"' -H (tV'. 

This bias may or may not be important in any given case. But it can conceiv- 
ably be quite serious if the ^4"' are relatively small, especially if such expressions 
are employed in the usual way to estimate the correlation coefficient rather than 
the covariance. 

, Perhaps the mo.st logical way to attack the problem is through the joint 
distribution of y‘, • • • , y” alone, obtainable by integrating the undesirable 
variates y*^*, • • • , y”'*’' out of (1). We therefore consider 

(6) /(y', • • • , y”) = (2ir)"*'’ I Aii I* exp { -il,-,(y* - a’)(y^ - o')}, 
where 

I.-, = IIB'-'II = llA,,r-‘ 

Moreover, when account is taken of (2), we find that we must have 
= ^ ^ a* - afjo" 

<r ff 

(S<i being Kronecker’s delta). If we now form the likelihood function 
n/(y‘« > ••• , yl) from (6) for our sample, and set its derivatives with respect 

a-l 

to the a", ff*, and the B"', equal to zero, we arrive, after some simplification, at 
the equations 


Icf. (3)1 
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2 (yi - 4o'‘)(yt - ajto')} Sii » 0, 

(7) ; “ I 

(y« - xyxvi - xia^j xixi - 0, 

+ xiA^^xi, 

for determining the maximum likelihood estimates. The first of equations (7) 
is already solved for the o", and the solution of the simultaneous equations for 
the remalhing essential parameters yields the estimates 

(8) L (ya - x; 

(9) ^ 


A considerable amount of algebraic manipulation is required to put the solu- 
tions in the form given above; but since the results are about what one would 
expect in view of (5), we omit the details. As is often the case, some bias re- 
mains in the “optimum” estimates (9). However, this can be eliminated by 
writing iV — 1 in place of N. The estimate (8) of a* is unbiased as it stands. 
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CONFIDENCE LIMITS FOR AN UNKNOWN DISTRIBUTION FUNCTION 

Bt a. Kolhogoboff 
Moscow, U.S.S.R. 

Let xi , a:* , • • • , a;„ be mutually independent random variables following the 
same distribution law 

( 1 ) P{Xi < {} = FW. 

A recent paper by A. Wald and J. Wolfowita* deals with the problem of using 

* A. Wald and J. Wolfowitz, “Confidence limits for distribution functions,” Annals of 
MaOt. Slat., Vol. 10 (1939), pp. 105-118. 
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the observable values of the «’s to estimate the function F((). In this connec- 
tion it may be useful to recall the following results published by me in 1933/ 
Put 

(2) FM » ® 

n 

where N{i) denotes the number of those x’s whose observed values do not 
exceed 

Theorem 1 : If the function F(^) is continuous then the distribution law of the 
quantities 

(3) = sup I F{i) - F„(f) I 
does not depend on F(i). 

Denote by ♦n(X) the value of the probability P{D„ < X} which is common 
to all continuous distribution functions F(i). 

Theorem 2: For n tending to infinity, the distribution function 4>«(X) tends to 

(4) 4>(X) - E 

J C wm ■■ 00 

uniformly with respect to X. 

A more elementary proof of Theorem 2 was given by N. Smirnoff in 1939.® 
Another paper by the same author^ gives a table of the function <^(X). 

Without the assumption that F(f) is continuous, we easily obtain 
Theorem 3: Whatever be the distribution function F({), 

(6) P{Dn < X} > ^n(X). 

Theorems 1 and 3 giving the exact lower bound of the probability that Fn(^) 
will satisfy the inequality 

(6) 1 F{^) - 1 < 

y/n 

for all values of can be used to establish confidence limits for F(^) corre- 
sponding to the confidence coefficient 

(7) a - ^n(X). 

These confidence limits will be free from any restriction concerning the nature 
of the function F(t). 

* A. Kolmogoroff, ^*Sulla determinatione empirioa di una legge di distributione/* Oiomcde 
ddViBtiiuto Italiano degli Attuari, Vol. 4 (1933), pp. 83-91. 

> N. Smirnoff, *^Sur lea hearts de la oourbe de diatribution empirique,” Recueil Math, de 
Moscou, Vol. 6 (1939), pp. 3-26. 

^ N. Smirnoff, ‘*0n the estimation of the discrepancy between empirical curves of distri- 
bution for two independent samples,” Bulletin de VUniveraiU de Moacou, SMe intemationdle 
(Math6matiquea)f Vol. 2, fasc. 2 (1939). 
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Eor sufficittitly laige values of n we can use ^e limiting distributkii (4) and 
write 

(8) a •" ^(X). 

The following short table, based on that of Smirnoff/ gives the values of X 
corresponding to a few chosen confidence coefficients a. 

TABLE OF X 


a 

X 

.95 

1.35 

.98 

1.52 

.99 

1.63 

.995 

1.73 

.998 

1.86 

.999 

1.95 


Smirnoff’s pape/ contains still another application of the function #(X). 
Denote by , xj , • • • , Xn^ and x i , x » , • • • , x„, two sequences of mutually inde- 
pendent random variables following the same probability law F((). Let further 
F„,(f) and be two random step functions corresponding to these series, 

defined as in (2). Smirnoff proves then the following 
Theorem 4 : If the probability law F(|) is continuous, then the probability 

(9) P jsUp I Pn.({) - P-.({) I < X /j/ = $n., n.(X) 

is independent of the function F(i). If ni and n* are indefinitely increase subject 
to the restriction that the ratio ni/nt remains between two fixed numbers ai and at 

(10) ■ 0 < Oi < ^ ^ Oj < -f^oe 

then 

(11) $,..„(X)-»$(X). 

In the general case, where the probability law F(() is absolutely arbitrary we have 

(12) pjsup I P„(f) - P„.(0 I < X j/ < #„.».(X). 

Owing to the above results the quantity 

(13) I>n,.„-8UplPn.(f)-F».(f)|y^^j^ 

could be used as a criterion to test the hypothesis that the probability laws of 
the two series of observable variables are actually the same. 




464 


K. a. KXNOALL 


COSItSCTIONS TO A PAPER ON THE IINIQUENISS PROBLEM 

OF MOMENTS 


By M. G. KmoALL 


London, England 

I wish to make certain corrections in my paper on “Conditions for Unique- 
ness in the Problem of Moments" {Annals of Math. Stat., Vol. 11 (1940), p. 402). 
I thought I had succeeded in improving on results given earlier by Stieltjes, 
L4vy and Carleman, but this is not so. 

Theorem 1 of the paper stated that a set of moments determines a distri- 

•0 aT 

bution uniquely if 22 -^t converges for some real non-zero t, Vr being the absolute 

r«0 r\ 

moment of order r. This is true, and a similar result has been proved by L^vy, 
but my proof contained a small lacuna. It was shown that the characteristic 
function ^(0 has a Taylor expansion which, under the conditions of the theorem, 
is convergent; but it has also to be shown that it is equal to the sum of that 
expansion. This may be seen as follows: 

We have 


its 


(oxY 


^ rV 

r-O rl 


nl 


10| < 1, 


and hence, on taking mean values. 


«(t) - 


{ay fir 

h ■ 7! 



Since by hypothesis 


nl 


0, ^(0 must be equal to the sum of its (convergent) 


Taylor expansion. 

The principal error was a statement that v]!”/n must either tend to a limit or 
diverge. For this reason, the second theorem should run: a distribution deter- 
mines a distribution uniquely if lim v^J^/n is finite (not lim v]!’'/n as originally 
stated). Theorem 3 should also be restated with the upper limit substituted 
for the limit therein. 

Theorem 4 stated that a set of moments uniquely determines a distribution 


if S -jy- diverges. A rigorous proof is as follows: 
The characteristic function obeys the relation 


l«‘"’«)l n>l 

provided, of course, that exists. A theorem of Denjoy‘ states that if a func- 
tion/(a:), defined in the segment (a, b), possesses derivatives of all orders therein, 


'Amaud Denjoy, "Bur les fonotions quasi-analytiques de variable r4dle,’’ Comptes Ren- 
du« Vol. 178 (1921), p. 1899. 
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if Mn is the maximum of |/”‘(a:) 1 in the segment and if 2 is (Eveigent, 

then f{x) is completely detennined by its value and that of its derivatives at a 
single point, obeys the conditions of the theorem and by taking the point 
to be < = 0, theorem 4 follows. 

I hope that this note will correct any misunderstandings that may have arisen 
on the mmn paper, and I regret that a number of circumstances, not the least 
of which is war, have made it impossible to forward the correction at an 
earlier date. 


ANNOUNCEMENT CONCERNING COMPUTATION OF 
MATHEMATICAL TABLES 

In the December, 1939, issue of the Annals of McUhematical Statistics, p. 399, 
there appeared an Announcement of the Mathematical Tables Project This 
project is operated by the Work Projects Administration of New York City, 
as 0. P. No. 265-2-97-11 under the technical supervision of Dr. A. N. Lowan. 
It is sponsored by the National Bureau of Standards, Dr. Lyman J. Briggs, 
Director. 

In order to keep the readers of the Annals up-to-date on the progress of the 
work of the Project, information will be released from time to time. 

The following list shows the status of work, as of October, 1941. The reader 
is referred to the December, 1939 issue of the Annals with respect to which n 
will denote the n*** item of Tables Published, Pn will denote the n*^ item of 
Tables in Progress and Cn will denote the n* item of Tables under Consideration. 

Tables published. 1, 2, 3, PI, P2, P3, Pi, P6(b), P6(c), P6(d), P6(e), P7, 
C7 and also 

1. Table of Five-Point Lagrangian Interpolants for arguments ranging be- 
tween 0 and 2 at intervals of 0.001. 

2. Tables of Grid Coordinates (American Polyconic Projection) at 5 minute 
intervals of latitude and longitude for latitude from 70®N to 28‘’N and for lati- 
tude from 49®N to 72“N. 

3. Table for Map Projections of Northwestern Extension of U. S. 

Tables in process of reproduction. P5, P6(a), P8 and Cl for [0 (.001) 7 (.01) 
60 (.1) 300 (1) 2,000 (10) 10,000; 12D1 also 

1. Tables of Section Moduli and Moments of Inertia for Structural Members 
used in Naval Architecture. (For the Bureau of Marine Inspection and 
Navigation.) 

2. Tables of Siix) and Ci(x) for x ranging from 10 to 100 at intervals of 0.001. 
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3. The zeros of the Legendre Pol3moinial8 up to .the 16th order to 16 decimal 
places and the Wei^t Coefficients for Gauss’ Mechmiioal Quadrature Formula. 

Tables for adikh manuscripts are cmqpletad. P9, Pll, C6, (the function x*, 
instead of A{x, y), has been tabulated to 16 places), and also 

1. Table of ^ Jo(t) dt from 0 to 10 at intervals of 0,01 to 10 places. 

Tables for udiich computations are conq>leted. PIO (also tanh x, coth x), 
C2, C3, (change to « = —21, —20 • • • 0) and also 

1. Various hydraulic tables based on Kutter’s and Manning’s formulae. 
(Tabulation suggested by the War Department.) 

2. Table of reciprocals of the integers from 100,000 to 200,000. 

3. Table of the Associated Legendre Functions PZ(x) and Qn(x) for n ranging 
between 1 and 10, and m between 0 and 4; for m-guments x and ix where x 
ranges between 0 and 10 at intervals of 0.1. Also corresponding values for half- 
integral values of n and values of the functions for arguments in degrees. (Tabu- 
lation suggested by National Defense Research Committee.) 

4. Tables of B sin $ and R cos 0. R = 1000 (10) 10,000, 0 * 5(6)800 (in 
mils). 

Tables for idikh computations are in progress. C3 (for n = 1, 2, • • ■ 20) 
and also 

1. Table of the Bessel Functions Yo(z) and Yi{z) for the same complex argu- 
ments as in Jo(z) and Ji{z), mentioned in P9. 

2. Tables of Length of Meridional Arc at one-minute intervals. 

3. Tables of the Confluent Hypergeometric Function for selected values of 
the parameters. 

4. Tables of three-point, four-point, six-point and seven-point Lagrangian 
Interpolants. 

5. Table of Tchebysh^ Polynomials. 

Tables under consideration. C4 and also 

1. Table of the first 10 powers of the reciprocals of the integers from 1 to 1,000. 

2. Extensive tables of Elliptic Functions fot both real and imaginary 
arguments. 

3. A 12-place table of Inverse Circular and Hyperbolic Functions other than 
Arc tan x. 

4. Table of the Integral Ye(t) dt. 

5. Tables of the non-periodic solutions of the Mathieu Differential Equation. 

.6. Table of the Error Functions for complex arguments (suggested by Federal 

Communication Commission). 

7. Tables of the Unit-Sigma Functions and their integrals. 
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8. Tables of Circular Funotioos for Cotnidex ArgumentB. 

9. Tables of the Zeros of Uie Hermite and Laguerre Pofynonuals and of ^ 
corresponding Weight Faetora in Gauss' Medianical Qua^ture Fohnula. 

10. Table of Lamifi Pdynomials. 

11. Table of Military (^d Coonhnates for certain "Control Stations." (For 
the War Department.) 

12. Tables of the CU*Square Distribution and "Student's" (-distribution. 

13. TabulSttion of Fisher's A-, B-, and C- Distributions of the Multiple Correla- 
tion CoefBcients. 

The Project would welcome suggestions for the computation of new tables of 
interest in pure and applied mathematics, as well as information r^arding com- 
putational work in progress elsewhere. 

Communications should be addressed to Major Irving V. Huie, Administrator, 
Worir Projects Administration, 70 Columbus Avenue, New York City. 

Requests for copies of published ttd>les should be addressed to Dr. Lyman J. 
Bri^, Director of the National Bureau of Standards, Washington, D. C. 



REPORT OF THE CHICAGO MEETING OF THE INSTITUTE 


The Fourth Summer Meeting of the Institute of Mathematical Statistics was 
held at The University of Chicago, Tuesday to Thursday, September 2 to 4, 
1941, in conjunction with the meetings of the American Mathematical Society, 
the Mathematical Association of America, and the Econometric Society. The 
following sixty-eight members of the Institute attended the meeting: 

R. L. Anderson, T. W. Anderson, K. J. Arnold, H. M. Bacon, Walter Bartky, W. D. 
Baten, A. A. Bennett, Paul Boschan, I. W. Burr, J. H. Bushey, W. E. Cederberg, W. G. 
Cochran, A. T. Craig, C. C. Craig, J. H. Curtiss, J. F. Daly, W. E. Doming, J. L. Doob, 
P. L. Dressel, P. S. Dwyer, Churchill Eisenhart, M. L. Elveback, H. P. Evans, C. H. Fischer, 
W. C. Flaherty, R. M. Foster, C. H. Graves, Louis Guttman, W. L. Hart, F. C. Hinds, 
A. S. Householder, E. V. Huntington, William Hurwitz, M. H. Ingraham, Dunham Jackson, 
Leo Katz, J. F. Kenney, L. A. Knowler, L. F. Knudsen, Tjalling Koopmans, C. F. Kossack, 

O. E. Lancaster, D. H. Leavens, B. A. Lengyel, W. G. Madow, J. N. Michie, A. M. Mood, 
J, E, Morton, Leah Naugle, Harold Nisselson, J. I. Northam, E. G. Olds, Oystein Ore, 
C. K. Payne, G. A. D. Preinreich, Francis Regan, Selby Robinson, C. F. Roos, M. M. 
Sandomire, Max Sasuly, Henry Scheffe, H. M. Schwartz, Harry Siller, J. H. Smith, M. E. 
Wescott, S. S. Wilks, E. W. Wilson, Gale Young. 

The opening session, on Tuesday morning, was devoted to contributed papers 
on Probability and Statistics and was held jointly with the American Mathe- 
matical Society and the Econometric Society. The Chairman was Professor 
A. T. Craig, University of Iowa, and the following papers were presented: 

1. A geometric derivation of Fisher^ s z-tramformaiion. 

J. B. Coleman, University of South Carolina. 

2. Large sample distribution of the likelihood ratio. 

Abraham Wald, Columbia University. 

3. On the integral equation of renewal theory. 

(Read by title.) 

Willy Feller, Brown University. 

4. Cumulative frequency Junctions . 

Irving Burr, Purdue University. 

6. On spherical probability distributions. 

K. J. Arnold, Massachusetts Institute of Technology. 

6. Some observations on analysis of variance theory. 

(Read by title.) 

Hilda Geiringer, Bryn Mawr College. 

7. On the asymptotic distribution of medians of samples from a multivariate population. 

A. M. Mood, University of Texas. 

8. A problem of estimation. 

J. F. Daly, Catholic University. 

Abstracts of these papers follow this report. 

On Tuesday afternoon a session was held jointly with the Econometric Society 
on Time Series Analysis. Under the chaimumship of Professor C. C. Craig of 
the University of Michigan, the following papers were presented: 
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1. It sampling timry a/ppUaMe to teonomic (tma serietf 

Tjalling Koopmans, Penn Mutual Life Inauranoe Co., Philadelphia. 

2. Serial correlation. 

R. L. Anderson, North Carolina State College. 

The morning session on Wednesday was held jointly with the Econometric 
Society on Curve Fitting. The chair was held by Dr. J. Marschak of the New 
School for Social Research and the following papers were presented: 

1. Weights to compensate for transformation in curve fitting. 

T. 0. Yntema, University of Chicago and Cowles Commission. 

2. Curve fitting by cumulative addition. 

John H. Smith, University of Chicago and Cowles Commission. 

On Wednesday afternoon. Professor S. S. Wilks of Princeton University acted 
as chairman of a session on Multivariate Arudyaie. The following papers were 
read: 

1. On testing sets of means and discriminant analysis. 

Abraham Wald, Columbia University. 

2. On tests of hypotheses concerning variances and covariances. 

William G. Madow, Bureau of the Census. 

The Josiah Willard Gibbs Lecture of the American Mathematical Society was 
delivered on Wednesday evening by Professor Sewall Wright of the University 
of Chicago. His topic was Statistical Genetics and Evolution. 

On Thursday morning a joint session on Demand and Supply Analysis was 
held with the Econometric Society. At this session Dr. C. F. Roos of the In- 
stitute of Applied Econometrics presided, and the following papers were 
presented: 

1. Demand analysis for certain commodities based on income and budget data. 

3. Marschak, New School for Social Research, and George Garvey, National Bureau 

of Economic Research. 

2. Derivation of elasticities of demand and supply: A direct mgthod. 

.Oscar Lange, University of Chicago and Cowles Commission. 

3. On the workings of a general equilibrium system. •• 

J. L. Mosak, University of CUcago and Cowles Commission. 

An informal reception was held on Monday evening in the Judson Court 
Lounge. On Tuesday and Wednesday afternoons the ladies of the Mathematics 
Department of the University of Chicago served tea in the Eckhart Hall Common 
Room. After the joint session on Tuesday afternoon, the Cowles Commission 
for Research in Economics gave a tea in the Common Room of the Science 
Building. On Thursday evening a joint dinner of the four mathematical organi- 
zations was held in Hutchinson Commons, preceded by an informal reception 
at the Reynolds Club. 

Enwm G. OlM, 

Secretary 



ABSTRACTS OF PAPERS 

(Presented on September 2, 1941, at the Chicago Meeting of the Institute) 


A Geometric Derivation of Fiaheris z-tfanefoimation. J. B. Colsmak, Uni- 
versity of South Carolina. 

In fitting points in a plane by a line so that the sum of the squares of the perpendicular 
deviations shaU be a minimum, a second line is found for which the sum of the squares of 
the deviations is a maximum. Let 2d* be the sum of the squares of the deviations of the 
points from the minimum line, and 2D* be the sum of the squares from the maximum line. 
Then 2D*/2d* - (1 + r)/(l — r) . i log (1 -f r)/(l — r) is Fisher’s s-transformation for test- 
ing the coefficient of correlation. 


Laif e Sample Distribution of the Likelihood Ratio. Abraham Wald, Columbia 
University. 

The large sample distribution of the likelihood ratio has been derived by S. S. Wilks 
(Annals of Math, Stat., Vol. 9 ( 1938 )) in case of a linear composite hypothesis and under 
the assumption that the hypothesis to be tested is true. Here a general composite hy- 
pothesis is considered and the distribution in question is derived also in case that the 
hypothesis to be tested is not true. Let/(a:i , • • • , , • • • , ^*) be the joint probability 

density function of the variates zi ^ ^ Xp involving k unknown parameters , • • • , 

Denote by Hu the hypothesis that the true parameter point ^ (^i , * * > j h) satisfies the 

equations ^i(d) * • • • ■■ ^(0) 0, (r < ifc). Denote by X« the likelihood ratio statistic for 

testing Hu on the basis of n independent observations on Xi , * • • ,Xp, For any parameter 


point 0 let |, 7 (^) « ~r— 
d logfjxi , ,Xp,e) 


and let Ci/(S) be the expected value of 


9 log/(a?i 




d0i 


ddf 


calculated under the assumption that 0 is the true parameter point. 


For any 6 denote by A($) the matrix || ^ii($) || (i - 1, , k) and let 

II <rii(e) II - II Cij(e) ||~S (ij - 1, ••• , A;). Let furthermore || atv(e) ||, (u, v - 1, ••• , r) 
be the matrix equal to the product || «r</(^) where A(0) is the transpose of 

A($). FinaUy let || clp(^) || — || vtv(B) |H» (w, v -■ 1, • • • , r). For each n and $ denote 
by yin(B)f • • • , VmiB) a set of r variates which have a joint normal distribution with mean 
values ••• , \/n^r(B) and covariance matrix l| vlt.(^) ||, (u, v — 1, ••• , r). De- 

It has been shown that under 


note the quadratic form 
certain assumptions on f(xi 


yfUi(e)yvn(B)ctv(o) by 0»(d). 

• t ^ » B), ^i(B), ••• , ^(B) we have lim {^(-2 log X» < 


t\B) -- F[Qn(^) < M ^11 *0 uniformly in t and B^ where for any * P(z < t | ^) denotes the 
probability that z < i holds under the assumption that B is the true parameter point. The 
distribution of Qn(B) is known and has been treated in the literature. If Hu is true, then 
€i(^) ^r(^) ■■ 0, and Qn(B) has the x* distribution with r degrees of freedom. 

bn fhe Integral Equation of Renewal Theory. W. Fbllbb, Brown University. 

As is well-known, the equation U(t) ■■ 0(t) -h U(i x) dF(x) has frequently been 
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diicuMed^ undar diffeveat forms, in oonaeotioii With the population theory, the^^vsory of 
industrial replacement, etc, In the present paper it Is shown thst* using Tauheriim 
theorems for Laplace integrals, it becomes possible to analyse in detail the aes^ptotic 
behavior of f7(0 as ^ and also to solve some other problems which have been discussed 
in the literature. Strict conditions for the validity of different methods to treat the equ^ 
tion are given together with some modifications found to be necessary. The paper will 
appear in the AnnaU of MathemaHcal Siatiotieo. * 

Cumulative Frequency Functioiuu I. W. Burr, Purdue University. 

Frequency and probability functions play a fundamental role in statistical theory and 
practice. They are, however, often inconvenient and difficult to use, since it is necessary 
to integrate or sum to find the probability for a given range. Theoretically the cumulative 
or integral frequency function would s^m to be better adapted to determining such prob- 
abilities, since the latter can be found simply by a subtraction. The aim of this paper is 
to make a contribution toward the direct use of cumulative frequency functions. Some 
general properties and theory of cumulative functions are presented with particular empha- 
sis upon certain moment functions adapted to such direct use. Both continuous and dis- 
crete cases are included. A list of possible cumulative functions is given and a particular 
one, F(x) ■■ 1 — (1 4* discussed fully. This function has properties which make it 

practicable and adaptable to a wide variety of distribution types. It well illustrates the 
possibilities of the cumulative approach. 

On Spherical Probability Distributions. Kenneth J. Arnold, Massachusetts 
Institute of Technology. 

Two methods of correspondence for circular distributions to the normal error function 
have led to non-constant absolutely continuous functions [See F. Zemike’s article in /fand- 
buch der Phyeik Vol. 3, pp. 477-478). The corresponding distributions for the sphere are 
found. The case of diametrical symmetry for both circle and sphere is discussed. Tables 
of the probability integrals involved are given and an application in geology is included. 

Some Observations on Analysis of Variance Theory. Hilda Geiringeh, 
Bryn Mawr College. 

The test functions used in analysis of variance present then^lves in different classes 
of important problems. Their distribution has been determined and tabulated by A. 
Fisher^ under the hypothesis that the chance variables are all independent of each other and 
subject to the same normal law. Consequently we can in this way test only the hypothesis 
that the theoretical populations have all these properties. 

If it is not possible to determine the exact distribution of test functions under sufficiently 
general assumptions regarding the populations we may: (a) find an asymptotic solution of 
the problem, i.e. determine the distribution of the test functions for large samples.* Or (b) 
determine at least the mathematical expectations and the variances of the test functions 
for appropriately general populations and for email eamplee. 

It is well known that the expectations of the two qua^atio forms which are basic in the 
analysis of variance are equal, even if the n populations are not normal but equal to each 
other (Bernoulli series). But, in addition, we can prove the mathematical theorem that, 
under the same conditions the expectation of their quotient equals one. The next step con- 
sists in studying the case that the n distributions are not equal to each other and to investi- 
gate certain inequalities characteristic for the Lexis Series and Poisson Series. These 
different criteria ai^ completed by the compulalion of the variances of the test functions. 

* ‘‘Metron,^^ Vol. 5 (1986), p. 90-104. 

* See e.g. W. G. Madow, Annals of Mcdh, SkU,, Vol. 11 <1940), p. 198. 
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In addition to the above mentioned test functions known as ^'varianoe within^’ and 
^^variance among’* classes other Bymm^trical test functions have been considered in the 
classical analysis of variance. Here again we may assume quite general populations. It 
results that the Lexis as well as the Poisson Series may now be characterized by equalitiee 
(instead of inequalities). 

Finally it seems to be worthwhile to omit the assumption of independent chance variables 
and to study different kinds of mutual dependence. These investigations lead to new in- 
structive inequalities among the expectations. These last considerations seem to be con- 
nected with Fisher’s * ’intraclass correlation” and to supplement this idea. 

On the Asymptotic Distribution of Medians of Samples from a Multivariate 
Population. A. M. Mood, University of Texas. 

Let two variates x\ and x^ have a density function /(:ri , x^) which, besides being positive 
or zero and having its integral over the whole space equal to one, shall satisfy these con- 
ditions: 




/(»! , 0) dx\ + 0 

/(O, a;,)d® + O0^ 


The coordinate system is assumed to have been chosen so that the population median is at 
the origin. Let (ii , xt) be the median of a sample of 2n -f 1 elements drawn from a popula- 
tion with this density function. It is shown that for large samples (xi , X 2 ) is normally 
distributed to within terms of order \/y/n with zero means and variances and covariances 
given by certain integrals of f{xx , x^). 

A similar result is true for k as well as two variates. 


A Problem in Estimation, Joseph F. Daly, The Catholic University of America. 

Consider a normal population in which each individual is characterized by the variates 
yi ) ' • ‘ » VfH-i » yp+2 • Suppose that the latter two are not directly observable, but that 

for given values of , yp ^2 the first set of y’s is independently distributed about the 
’’regression line” y* - yp+i -f fc|/p +2 (A; » 1, • • • , p) with a common variance <r^ For each 
individual, one can thus determine values , yp+t from the observed 1/1 , • • • , 2 /p, using 
the method of least squares. Assuming a similar relation between the expected values of 
yi > * • * > yp+i in the original population, these estimates ^p^i , g'p+j are, of course, unbiased. 
However, if we calculate these ^’s for each individual of a sample of and substitute them 
in the Pearson product-moment correlation formula, the estimate of the correlation be- 
tween i/p+i and j/p +2 thus obtained is somewhat biased. The bias depends on the numbiir of 
observable p’s, and on the size of the variances and covariances of y^+i , pp ^2 relative to cr*. 

I8 Sampling Theory Applicable to Economic Time Series? T. J. Koopmans, 
Penn Mutual Life Insurance Company. 

The classical regression theory assumes that the values of the independent variables 
remain the same in repeated samples. Certain situations in economic analysis, like price 
formation according to the ’’cobweb” theorem, require a sampling theory of serial regression 
in which certain observations may represent a dependent variable at one time and an inde- 
pendent variable at a later time. This leads to the problem of the joint distribution of 
certain quadratic forms in normal variables. 

The simplest problem of this type is that of the distribution of the ratio r q/p of a 
quadratic form q in T observations from a normal distribution with mean 0 to the sum p 
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of the squaree of tfaeee pbeenratioiur. The distribution of r is independent of that of p 
and is 



where the kt are the characteristic values of q, while the path of integration y proceeds 
from r through the lower half of the complex plane to a point on the real axis exceeding any 
kt and from there returns to r through the upper half-plane. 

In testing for the presence or absence of serial correlation (or regression) q is the sum of 
products of successive observations, and fe » v® cos {rt/iT + 1)1. Replacing this set of 
discrete values in the above integral by a continuous variable of similar distribution, the 
following approximation to the distribution of r is found: 

- / \ 
h*{r) (sin^ - (7 T - ~-^^Yco8*^<4 

Jarcinr V ^ / 



CONSTIT0TIOH 

OF THB 

mSTITUTB OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Pxjr^sb 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2. Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 

Membership 

1. The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, and Committee on 

Publications 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre- 
tary-Treasurer, elected for a term of one year by a majority ballot at the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31, 1936. 

2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3. The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years. The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors. The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 
Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at such 
time as the Board of Directors may designate. Additional meetings may be ci^ed from 
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time to time i>y the Board of Direotora and shall be called at any tame by 
upon written rectuest from ten I>%11owb. Notice of the time and pla«9e of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting. AH meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term. Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on M^bership shcdl hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques- 
tion at the next meeting. 

4. At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member- 
ship, two members shall constitute a quorum. 

ARTICLE V 

PUBUCATIONS 

1. The Annals of Mathematical Statistics shall be the Official Journal for the Institute. 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion or Suspension 

1. Except for non-pa 3 rment of dues, no one shall be^expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1. This constitution may be amended by an affirmative two-thirds vote at any regu- 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties of the Officers, Board of Directors, Committee on Membership, and 

Committee ok Publications 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
Preiddent and both Vice-Presidents, a FeBow selected by vote of the Felloro present, 
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shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the case of a tie, but at meetings 
of the Board of Directors he may vote in all cas^. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting. Additional nomina- 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretary-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre- 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti- 
tute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti- 
tute, and. of all books, pamphlets, manuscripts and other literary or scientific material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


ARTICLE II 
Dues 

1. Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues. 

2. Annual dues shall be payable on the first day of January of each year. 

^ 3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal. 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose dues 
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may be six months m arreais, uid to accompany such notice by a copy (rf this Article. 
If such person fail to pay such dues within three months from thatiate (tf mailing such 
notice, the Seoretary-lVeasurer t^U report the delinquent one to the Board of Directors, 
by whom the person's name may be stricken from the rolls and all privileges of meml^> 
ship withdrawn. Such person may, however, be re-instated by the Board of Direotms 
upon payment of the arrears of dues. 

ARTICLE III 
Salabixs 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 


ARTICLE IV 
AlOiNDMXNTS 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend- 
ment has been previously approved by the Board of Directors. 



MEMBERS OP THE mSTITUTE OF MATHEMATICAL STATISTICS^ 

(Aa of November 1, 1941) 

Acerboni, Dr. Argentlno V. Banfield Larroque 232, Banfiield, Argentina. 

Alter, Prof. Dinimore Director of Griffith Observatory, Los Angeles, Calif. 

Anderson, Paul H. Ph.D. (Illinois) Dept, of Math., Louisiana State Univ., University, 
La. 

Anderson, Richard L. Ph.D. (Iowa State Coll.) Part-time instr., North Carolina State 
Coll., Raleigh, N. C. 

Anderson, Theodore W., Jr. B.S. (Northwestern) Instr., Princeton Univ., Princeton, 
N. J. Qraduate College, 

Anthony, Lucius Woodinville, Wash. 

Arnold, Asso* Prof. H. £. Ph.D. (Yale) Wesleyan Univ., Middletown, Conn. 

Arnold, Kenneth J. B.S. (Mass. Inst, of Tech.) 34 Field St., Boston, Mass. 

Aroian, Leo A. Ph.D. (Michigan) Instr., Hunter Coll., New York, N. Y. 696 Park Ave, 

Arrow, Kenneth J. M.A. (Columbia) Fellow, Columbia Univ., New York, N. Y. 749 
West End Ave, 

Ashcroft, A. Griffin M. E. (Cornell) Product Eng., Alex. Smith & Sons Carpet Co., 
Yonkers, N. Y. 

Bachelor, Robert W. M.B.A. (Washington) American Bankers Association, New York, 
N, Y. 

Bacon, Asst. Prof. Harold M. Ph.D. (Stanford) Stanford Univ., Stanford University, 
Calif. Box 1144’ 

Baker, George A. Ph.D. (Illinois) Experiment Sta., Coll, of Agric., Univ. of California, 
Davis, Calif.. 

&mes, Jarvis M.A. (Peabody) Teacher, Atlanta City Schools, Atlanta, Ga. 744 Vir- 
ginia Ct.f NE. 

Barral-Souto, Prof. Jose Sc.D. (Univ. of Buenos Aires) Buenos Aires, Argentina. Cor- 
doba 1469, 

Barrett, Claudius S. M.A. (Northwestern) Dept. Chief, Western Electric Co., Inc., 
Hawthorne Sta., Chicago, 111. 

Bartky, Asso. Prof. Walter Ph.D. (Chicago) Univ. of Chicago, Chicago, 111. 

Baten, Asso. Prof. Walter D. Ph.D. (Michigan) Res. Asso., Mich. Agric. Exp. Sta., 
Mich. State Coll., East Lansing, Mich. 411 Marshall St.. 

Bates, Prof. O. Kenneth Sc.D. (Mass. Inst, of Tech.) Cummings Prof, of Math, and 
Head of Dept., St. Lawrence Univ., Canton, N. Y. 

Battin Isaac L. A.M. (Swarthmore) Instr., Brothers Coll, of Drew Univ., Madison, N. J. 

Beal, Virginia B. B.A. (Mount Holyoke) Wis. Alum. Res. Found. Scholar, Univ. of 
Wisconsin, Madison, Wis. B80 North Brooks Si. 

Beall, Dr. Geoffrey Dominion Entomological Lab., Chatham, Ont., Can. 7B9 Queen St. 

Bechhofer, Robert B. A.B. (Columbia) Jr. Statistician, Aberdeen Proving Ground, Aber- 
deen, Md. 107 Law St, 

* Members were asked to supply fresh information for this Directory. Records may be 

inexact or incomplete because of failure of some members to comply with this request. 

Changes in addresses, or errors in names, titles or addresses should be reported to the 

Secretary. 
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Bm* IticliArd O. M A. (George Washington) Agiio, Eoonomisti Bureau Agrie« Eeo- 
nomioei Washington/ D. 0. 

BelUnaon, Harold It. M.S. (Mass. Inst, of Tech.) Asso. Statistician, Aberdeen Proving 
Ground, Aberdeen, Md. Box S8B. 

Bennett Prof. A« A. Ph.D. (Princeton) Brown Univ., Providence, R. I. 

Benson Paul MA. (Michigan) Instr., Bucknell Univ., Lewisburg, Pa. 

Berger, Richard M.A. (Columbia) Nat. Bureau Eoon. Research, New York, N. Y. B(f 
Rugby Rd.fRochville Ccnirs, N, Y, 

Berksott, Dr« Joseph Mayo Clinic, Rochester, Minn. 

Bernstein, Prof. Felix Ph.D. (Gbttingen) New York Univ., New York, N. Y. BJi9 
Broadway,, 

Bingham, Marion D. A.B. (George Washington) Asst. Statistician, Dept. Agriculture, 
Washington, D. C. 5fd N. Piedmont Arlingtonf Va, 

Blackburn, Asso. Prof. Raymond F. Ph.D. (Pittsburgh) Univ. of Pittsburgh, Pitts- 
burgh, Pa. 

Blake, Archie Ph.D. (Chicago) Asso. Mathematician, U. S. Coast and Geodetic Survey, 
Washington, D. C. 

Blanche, Ernest £. Ph.D. (Illinois) Instr., Mich. State Coll., East Lansing, Mich. 

Bliss, Chester I. Ph.D. (Columbia) Bibmetrician, New Haven and Storrs, Conn., Exp. 
Stations. Conn. Agric. Exp. Station, Box 1106, New Haven, Conn. 

Boley, Charles C. M.S. (Illinois) Asst. Mining Eng., 111. State Geol. Survey, Natural 
Resources Bldg., Urbana, III. 

Bonis, Austin J. B.8. (C. C. N. Y.) 1226 Park Ave., New York, N. Y. 

Book, Frances Bookkeeper, Salant and Salant, 56 Worth St., New York, N. Y. BSiS 
Aqueduct Ave., Bronx, N. Y. 

Boschan, Paul Ph.D. (Vienna) Statistician, Inst, of Applied Econometrics, 405 Lexington 
Ave., New York, N. Y. 

Bowker, Albert H. S.B. (Mass. Inst, of Tech.) Res. Asst., Mass. Inst, of Tech. Cam- 
bridge, Mass. 

Boyer, Prof. Lee E. Ed.D. (Penna. State Coll.) Millersville State Teachers Coll., Millers- 
ville. Pa. 

Brady, Dorothy S. Ph.D. (California) Home Ec. Specialist, Bureau of Home Economics, 
Washington, D. C. S848 Calvert St. 

Brandt, Alva E. Ph.D. (Iowa State Coll.) Prin. Soil Conservationist and Chief Conserv. 
Exp. Sta. Div., Washington, D. C. Box 89, Route S, Vientia, Va. 

BrIdgSr C. A. M.S. (Oregon State Coll.) Statistician^ State Dept. Public Health, Boise, 
Idaho. P. 0. Box 144, Eagle, Idaho. 

Bronfenbrenner, Martin Ph.D. (Chicago) Statistician and Analyst, Federal Reserve 
Bank of Chicago, Chicago, 111. 

Brookner, Ralph J. M.S. (Michigan) Ensign, Bureau of Ordnance, Navy Dept., Wash- 
ington, D. C. 120 C St., NE, Apt. 202. 

Brooks, A. G. 2803 W. Erie St., Chicago, 111. 

Brown, George W. Ph.D. (Princeton) R. H. Macy and Co., New York, N. Y. 129 W. 
86 Si. 

Brown, Richard H. A.B. (Columbia) Foundation for Study of Cycles, Lexington Ave., 
New York, N. Y. 1107 John Jay Hall, Columbia Univ. 

Bryan, Joseph G, S.B. (Mass. Inst, of Tech.) 07 Green St., Melrose, Mass. 

Burgess, R. W. Ph.D. (Cornell) Chief Statistician, Western Electric Co., 195 Broadway, 
New York, N, Y. 

Burr, Asst. Pmf. Irving W. Ph.D. (Michigan) Purdue Univ., W. Lafayette, Ind. 266 
Littleten St. 

Bushey, Asso. Prof. J. H. Ph.D. (Michigan) Hunter Coll., New York, N. Y. 

Caine, Walter B. M.B.A. (Northwestern) Sr. Rate Investigator, Fed. Power Com.; Dir. 
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Twentieth Century Fund, Survey of Rel. between Govt, and Eleo. Light and Power 
Industry. S57 N. Olebe Rd., Arlington, Fo. 

Galkina, Prof* Helen Ph.D. (Cornell) Head of Math. Dept., Penna. Coll, for Women, 
Pittsburgh, Pa. 

Camp, Prof. Burton H. Ph.D. (Yale) Wesleyan Univ., Middletown, Conn. 110 Mi. 
Vernon St.. 

Carlson, John L. M.A. (Stanford) Instr., Keno High School, Reno, Nev. 7S6 West St, 
Carlton, A. George B.A. (Gustavus Adolphus) Jr. Statistician, War Dept. General Staff. 
G-4, Washington, D. C. SS9 S4th PL, NE. 

Carver, Prof. H. C. Ph.D. (Michigan) Dept, of Math., Univ. of Michigan, Ann Arbor, 
Mich.. 

Cederberg, Prof. William £. Ph.D. (Wisconsin) Augustana Coll., Rock Island, 111. 
25 22i Ave. 

Cell, Asso. Prof. John W. Ph.D. (Illinois) North Carolina State Coll., Raleigh, N. C. 
Chang, Z. T. A.M. (Columbia) 7 Lane 720, Avenue Foch, Shanghai, China. 

Chalmers, Juanita 36 West 139 St., New York, N. Y. 

Chapman, Roy A. U. S. Forest Service, 1061 New Federal Bldg., New Orleans, La. 

Clark, Asso. Prof. A. G. A.M. (Colorado) Colorado State Coll, of A. and M., Fort Collins, 
Colo. 1012 Laporte Ave, 

Cochran, Prof. William G. M.A. (Cambridge) Iowa State Coll., Ames, Iowa.. 

Cohen, Capt. Alonzo C., Jr. Ph.D. (Michigan) Picatinny Arsenal, Dover, N. J. 

Cohen, Melvin S. B.A. (Brooklyn) Jr. Statistician, Bureau of the Census, Washington, 
D. C. 100 C St., SE. 

Coleman, E. P. Dept, of Math., Municipal Univ. of Omaha, Omaha, Nob. 

Coleman, Prof. James B. Ph.D. (California) Univ. of South Carolina, Columbia, S. C. 
620 Bull St. 

Cotterman, Charles W. Ph.D. (Ohio State Univ.) Res. Asso., Univ. of Michigan, Ann 
Arbor, Mich. 

Court, Louis M. M.A. (Cornell) 480 West 187 St., New York, N. Y. 

Cowan, Donald R. G. Ph.D. (Minnesota) Mgr., Coml. Res. Dept., Republic Steel Corp., 
Republic Bldg., Cleveland, Ohio. 

Cox, Gerald J. Ph.D. (Illinois) Sr. Commodity Specialist, Office of Prod. Mgt., Chem. 
Br., Washington, D. C. 3803 S St., NW. 

Cox, Prof. Gertrude M. M.S. (Iowa State Coll.) Head, Dept, of Experimental Statistics, 
North Carolina State Coll., Ralengh, N. C. 

Craig, Asso. Prof. Allen T. Ph.D. (Iowa) Univ. of Iowa, Iowa City, Iowa.. 

Craig, Asso. Prof. Cecil C. Ph.D. (Michigan) Univ. of Michigan, Ann Arbor, Mich.. 
Crathome, Prof. A. R. Ph.D. (Gottingen) Univ. of Illinois, Urbana, 111.. 

Crowe, S. E. 137 University Dr., East Lansing, Mich. 

Curtiss, Asst. Prof. John H. Ph.D. (Harvard) Cornell Univ., Ithaca, N. Y, 

Daly, Joseph F. Ph.D. (Princeton) Instr., Catholic Univ. of Am., Washington, D. C. 
Dantzig, George B. 2609 Fulton St., Berkeley, Calif. 

Davies, Prof. George R. Ph.D. (North Dakota) Univ. of Iowa, Iowa City, Iowa. 

Day, Besse B. A.M. (Michigan) Statistician, Forest Serv., Dept, of Agriculture, Wash- 
ington, D. C. 

Deemer, Walter L., Jr. A.B. (I..ehigh) Educ. Res. Corp., 13 Kirkland St., Cambridge, 
Mass. 

De Lury, Daniel B. Ph.D. (Toronto) Lecturer, Univ. of Toronto, Toronto, Ont., Can. 
Deming, W. Edwards Ph.D. (Yale) Prin. Mathematician, Bur. of the Census, Wash- 
ington, D. C.. 

Di Salvatore, Philip M.A. (Princeton) Head, Reinsurance Sec., Guardian Life Ins. Co., 
New York, N. Y. 

Dixon, Wilfrid J. M.A. (Wisconsin) Res. Asst., Princeton Univ., Princeton, N. J. 86 
Humbert St. 
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Dodd, Prof. BdwAtd L. Fh.D. (Yale) Univ. of Texas, Austin, Texas ITssi Ase*. 

Dodfo, Harold F. Aid. (Columbia) Quality Results Eng., Bell Telephone Labs., Inc., 
NewYork, N. Y, 

Doob, Asst, Prof. J, L. PhJD. (Harvard) Univ. of Illinois, Urbana, 111. Linden Lane, 
Princeton, N,J,. 

Dom, Harold F. Ph.D. (Wbconsin) Sr. Statistician, U. S. Public Health Serv., Wash- 
ington, D. C. 16 Burning Tree CL, Betheeda, Md. 

Dorwellor, Paul B.S. (Iowa) Actuary, Aetna Casualty and Surety Co., Hartford, Conn, 

Dresch, Lt. (jg) Francis W. Ph.D. (California) U. S. N. R., Naval Proving Or., Dahl- 
gren, Va. 

Dressel, Asst. Prof. Paul L. Ph.D. (Michigan) Michigan State Coll., East Lansing, Mich. 

Dunlap, Jack W. Ph.D. (Columbia) Catherine Strong Hall, Univ. of Rochester, Roch- 
ester, N. Y. 

Durand, David A.M. (Columbia) Natl. Bur. of Economic Res., Hillside, Riverdale, N. Y. 

Dutka, Jacques A.M. (Columbia) 56 West 180 St., Bronx, N. Y. 

Dwyer, Asst. Prof. Paul S. Ph.D . (Michigan) Univ. of Michigan, Ann Arbor, Mich. $609 
James St,. 

Bdgett, G. L. Queen’s Univ., Kingston, Ont., Can. 

Bisenhart, Asst. Prof. Churchill Ph.D. (London) Univ. of Wisconsin, Madison, Wis. 
Tower Room, Agronomy Bldg. 

Elkin, William F. M.S. (Michigan) Kellogg Found. Fellow in Pub. Health Statistics, 
Allegan County Health Dept., Allegan, Mich. 

Blklns, Thomas A. A.M. (Princeton) Geophysicist, Gulf Res. and Dev. Co., Pittsburgh, 
Pa. 9$56 Parkview Ave. 

Blsion, James S. Travelers Insurance Co., Hartford, Conn. 

Biting, John P. Kendall Mills, Paw -Creek, N. C. 

Blveback, Mary L. M.A. (Minnesota) Hunter Coll., Park Ave. and 68 St., New York, 
N. y. 

Embody, Daniel R. M.S. (Cornell) Instr., Cornell Univ., Ithaca, N. Y. 

Bttlnger, Wallace J. B.S. (Lewis Inst.) Designing Eng., Edison Gen. Elec. Appliance 
Co., 5600 West Taylor St., Chicago, 111. 

Budey, Mark A.B. (California) 1904 University Ave., Apt. 4, Berkeley, Calif. 

Evans, Asso. Prof. Herbert P. Ph.D. (Wisconsin) Univ. of Wisconsin, Madison, Wis. 

Evans, W. D. B.S. (Clarkson) Prin. Economist, Bur. of Labor Statistics, Dept, of Labor, 
Washington, D. C. 

Faust, Richard H. B.S. (Yale) U. S. Army, Co. A. 28th Inf. Trn. Btn., Camp Croft, S. C. 
$69 Vredand Ave., Nutley, N. J . 

Feldman, Hyman M. Ph.D. (Washington) ‘ Teacher, Beaumont H. S., St. Louis, Mo. 

Feller, Asst. Prof. Willy Ph.D. (Gottingen) Brown Univ., Providence, R. I. 

FeiHg, Prof. John W. Ph.D. (Minnesota) Columbia Univ., De Lamar Inst, of Pub. 
Health, 600 West 168 St., New York, N. Y. 

Fischer, Carl H. Ph.D. (Iowa) Math. Dept., Univ. of Mich., Ann Arbor, Mich. 

Fisher, Prof. Irving Yale University, New Haven, Conn. 1^80 Prospect St,. 

Flaherty, Asst. Prof. William C. A.B. (Georgetown) Georgetown Univ., Washington, 
D. C. 

Flood, M. M. Ph.D. (Princeton) Res. Asso., Princeton Local Govt. Survey, Princeton, 
N. J. 

Foster, Ronald M. S.B. (Harvard) Bell Telephone Labs., 463 West St., New York, N. Y. 
162 East Dudley Ave,, Westfield, N, J. 

Fox, Asso. Prof. Philip G. A.M. (Wisconsin) Univ. of Wisconsin, Madison, Wis. 

Frankel, Lester R. 1445 Otis PL, NW, Washington, D. C. 

Freeman, Asst. Prof. Harold A. S.B. (Mass. Inst, of Tech.) Mass. Inst, of Tech., Cam- 
bridge, Mass. 
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Friadman, Milton M.A. (Chicago) Conault. Expert, Treasury Dept., Washington, D. C. ; 

Natl. Biir. of Econ. Res., 1819 Broadway, New York, N. Y. 16 SL, NW^ Wa%hing* 
ton, D, C. 

Fry, Thornton C. Ph.D. (Wiaoonain) Res. Mathematician, Bell Telephone Labs., 463 
West 8t., New York, N. Y.. 

Fryer, Asst. Prof. Holly C. Ph.D. (Iowa State Coll.) Kansas State Coll., Manhattan, Kan. 

Gage, Robert P. M.S. (Iowa State Coll.) Asso. in Dept, of Biometry and Med, Statistics, 
Mayo Clinic, Rochester, Minn. 

Oause, G. Rupert B.S. (Citadel) Asst. Statistician, War Dept., Aberdeen Proving Gr., 
Md. Box 179, Aberdeen, Md, 

Geiringer, Hilda P. Ph.D. (Vienna) Lecturer, Bryn Mawr Coll., Bryn Mawr, Pa.. 

Gibson, Robert W. A.M. (Illinois) 602 S. Busey St., Urbana, 111. 

Glntzler, Leone B. M.A. (California) Stat. Res. Asst., Bur. of Pub. Adm., Univ. of Cali- 
fornia, Berkeley, Calif. 

Glrshick, M. A. Bur. of Home Economics, Dept, of Agriculture, Washington, D. C. 

*GU)ver, Prof. James W. Dept, of Math., Univ. of Michigan, Ann Arbor, Mich.. 

Gordon, Robert D. A.M. (Stanford) Res. Asst., Scripps Inst, of Oceanography, La Jolla, 
Calif. 

Graves, Asst. Prof. Clyde H. Ph.D. (Chicago) Penna. State Coll., State College, Pa. 

Greenmod, Asst. Prof. Joseph A. Ph.D. (Missouri) Duke Univ., Durham, N. C. ISH 
Norton Si. 

Greville, Thomas N. E. Ph.D. (Michigan) Asso. Actuarial Mathematician, Bur. of the 
Census, Washington, D. C. 

Griffin, John I. Ph.D. (Columbia) Instr., Long Island Univ., Brooklyn, N. Y. 116 
Henry Si, 

Grove, Asst. Prof. C. C. Ph.D. (Johns Hopkins) Coll, of the City of New York, 17 Lexing- 
ton Ave., New York, N. Y. US Milhurn Ave., Baldwin, N, Y, 

Guard, Harris T. M.S. (Colorado) Instr., Colorado State Coll., Fort Collins, Colo. 

Gumbel, Asso. Prof. Emil J. Ph.D. (Munich) New School for Social Res., 66 West 12 St., 
New York, N. Y. S8S0 Waldo Ave., 

Guttman, Louis M.A. (Minnesota) Instr., Cornell Univ., Ithaca, N. Y. 

Hagood, Mrs. Margaret Jarman Ph.D. (North Carolina) Res. Asso., Univ. of North 
Carolina, Chapel Hill, N. C. One Village Apia. 

Haines, Harold M.S. (New York) Res. Asst., Burndy Engineering Co., 459 East 133 St., 
New York, N. Y. 

Ebimmer, Preston C. Ph.D. (Ohio State Coll.) Instr., Oregon State Coll., Corvallis, Ore. 

Hand, Howard J. B.S. (Carnegie Inst, of Tech.) Head, Stat. Div., Met. Dept., Natl. 
Tube Co., Lorain, Ohio 698 Lakeside Ave. 

Hansen, Morris H« M.A. (American) Acting Asst . Chief Stat . , Bur. of the Census, Wash- 
ington, D. C. 

Harshbarger, Asst. Prof. Boyd M.A. (Illinois) Virginia Polytechnic Inst., Blacksburg, 
Va. Fellows Reading Room, George Washington Univ., Washington, D. C. 

Hart, Bertha I. A.M. (Cornell) Sr. Computer, Ballistic Res. Lab., Aberdeen Proving 
Or., Md. 

Bburt, Prof. William L. Ph.D. (Chicago) Univ. of Minnesota, Minneapolis, Minn. 

Head, George A. Technical Staff, Bell Telephone Labs., Inc., 463 West St., New York, 
N. Y. 

Hebley, Henry F. Product Control Mgr., Pittsburgh Coal Co., P.O. Box 145, Pittsburgh, 
Pa. 


* Deceased. 
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Myron 8. Ph.D. (New York) Icuitr., New York Uni%, N0w York, N. Y*; 
Director, Survey of Eee« in Recreation, Fed. Works Agency, New York, N. Y. lit 
W€9t 19 at, 

Henderson, Robert D.So. (Toronto) Vice Pres, and Actuary, Retired, Equitable Life 
Assur. Soc., New York, N. Y. Crown Point, E$$ex Co., iV. F.. 

Hendrldks, Walter A* Div. of Agr. Stat., Agr. Marketing Serv., U. S. Dept, of Agriculture, 
Washington, D. C. 

Henry, Malcolm H. M.B. (Michigan) Asst. Stat., Michigan Dept, of Soe. Welfare, Lan- 
sing, Mich. 1196 Todd Ave. 

Hermie, Albert M. M.S. (Illinois) Jr. Statistician, (Mce of Prod. Mgt., 1102 Raleigh 
Hotel, Washington, D. C. Feltowa Reading Room, George Washington Univ. 

Hildebrandt, Asst. Prof. £. H« C. PhD. (Michigan) State Teachers Coll., Upper Mont- 
clair, N. J. 

Hinds, Asst. Prof. Frances Campbell (Mrs.) M.A. (California) George Peperdine Coll., 
Los Angeles, Calif. 1191 West 19 St, 

Hoel, Asst. Prof. Paul O. Ph.D. (Minnesota) Univ. of California, Los Angeles, Calif. 

Hopper, Asst. Prof. Grace M. Ph.D. (Yale) Vassar Coll., Poughkeepsie, N. Y. 

Horst, Aaron P. Ph.D. (Chicago) Supervisor of Selection Res., Procter A Gamble, Cin- 
cinnati, Ohio. 

Hotelling, Prof. Harold PhD. (Princeton) Columbia Univ., New York, N: Y. Mountain 
Lakes, N. 

Householder, Alston S. Ph.D. (Chicago) Res. Asso. in Math. Biophysics, Univ. of 
Chicago, Chicago, 111. 

Hoy, Asst. Prof. Elvln A. Univ. of Hawaii, Honolulu, Hawaii 509 West 191 St, New York, 
N, Y. 

Hsu, Chung-Tsi M.S. (London) 206 Livingston Hall, Columbia Univ., New York, N. Y. 

Huntington, Prof. Emeritus Edward V. Ph.D. (Strassburg), 8. D. (San Marcos) Harvard 
Univ., Cambridge, Mass. 4^ Highland St,. 

Hurwitz, William 119 Concord Way, Washington, D. C. 

Ingraham, Prof. Mark H. Ph.D. (Chicago) Univ. of Wisconsin, Madison, Wis.. 

Jablon, Seymour A.M. (Columbia) 200 West 108 St., New York, N. Y. 

Jackson, ]^of. Dunham Ph.D. (G5ttingen) Univ. of Minnesota, Minneapolis, Minn.. 

Jackson, Robert W. B. Ph.D. (London) Lecturer, Univ. of Toronto, Toronto, Ont., Can. 

Jacobs, Walter A.M. (George Washington) Asst. Finan. Stat., Securities and Exchange 
Commission, Washington, D. C. 1491 Somerset PI. 

Juran, J. M. Western Electric Co., 196 Broadway, New York, N. Y. 

Jaramillo, Trinidad J. Ph.D. (Chicago) Actuary, Bur. of the Treasury; Instr., Far 
Eastern Univ., Manila, P. I. P. 0. Box 1046. 

Johner, Paul B.S. (Carnegie Inst, of Tech.) First Lt., Ordnance Dept., Pittsburgh Ord- 
nance Dist., 1202 Chamber of Commerce Bldg., Pittsburgh, Pa. 1118 Victoria Ave,, 
New Kensington, Pa. 

Johnson, Asst. Prof. Evan, Jr. Ph.D. (Chicago) Penna. State Coll., State College, Pa. 

Johnson, Prof. Palmer O. Ph.D. (Minnesota) Univ. of Minnesota, Minneapolis, Minn. 

S[antorovitz, Myron Ph.D. (Berlin) Res. Fellow, Milbank Memorial Fund, 40 Wall St., 
New York, N, Y. 9446 99 Street, Astoria, N, Y, 

Katz, Leo M. A. (Wayne) Statistician, Dept, of Labor and Industry, Lansing, Mich. 

Katz, Mortimer B. M.A. (George Washington) Jr. Statistician, Bur. of the Census, Wash- 
ington, D. C. 

Katzoff, E. Taylor Ph.D. (Northwestern) Asst., Northwestern Univ., Evanston, HI. 

Keeping, E. S. Dept, of Math., Univ. of Alberta, Edmonton, Alberta, Can. 

Keffer^ Ralph M.A. (Wisconsin) Aetna Life Ins. Co., Hartford, Conn. 

Kelley, Prof. Truman L. Ph.D. (Columbia) Grad. Sch. of Edue., Peabody House, Kirk- 
land St., Cambridge, Mass.. 
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Kenney, John F. A.M. (Michigan) Univ. Exteneion, Univ . of WisoonBin, Milwaukee, Wis. 

Kent, Eobert H« A.M. (Harvard) Asso. Director, Ballistic Res, Lab., Aberdeen Proving 
Gd., Md. 

Kieman, Prof. Charles J. M.S. (Columbia) St. John’s Univ., Brooklyn, N. Y. t4 Fair- 
banks St., Hillside, N. J, 

Kimball, Bradford F. Ph.D. (Cornell) Sr. Statistician, N. Y. State Pub. Serv. Com- 
mission, 8Q Centre St., New York, N. Y. SS Bogart Ave,, Port Washington, N. Y, 

Sling, Arnold J. B.S. (Wyoming) Agr. Statistician, U. S. D. A., A.M.S., Statistical Lab., 
Iowa State Coll., Ames, Iowa. 

King, Frederick G. A.B. (Harvard) Asst. Statistician, N. Y. State Pub. Serv. Com- 
mission, New York, N. Y. $09 West 116 St. 

Kingston, Prof. Jorge C. E. (Brazil) Univ. of Brazil, Rio de Janeiro, Brazil 44 Rua 
Rita Ludolf. 

Knowler, Asst. Prof. L. A. Ph.D. (Iowa) Univ. of Iowa, Iowa City, Iowa. 

Blnudsen, Lila F. B.S. (Minnesota) Asst. Mathematician, Food and Drug Adm., Fed. 
Sec. Agency, Washington, D. C. 

Kohl, Alma 1209 W. Sherwin Ave., Chicago, 111. 

Konljn, Hendrik S. C.E.W. (Rotterdam) Statistician, Natl. Bur. of Econ. Res., West 
264 St. and Independence Ave., New York, N. Y. 245 East 72 St. 

Koppmans, Tjalling Dr. Math. A Pby. Sc. (Leiden) Penn Mutual Life Ins. Co., Phila- 
delphia, Pa. 

Kossack, Carl F. Ph.D. (Michigan) Instr., Univ. of Oregon, Eugene, Ore. 

Kozelka, Asst. Prof. Richard L. Ph.D. (Minnesota) Univ. of Minnesota, Minneapolis, 
Minn. 

Kullback, Solomon Ph.D. (George Washington) Lecturer, George Washington Univ., 
Washington, D. C.. 

Kurtz, Albert K. Ph.D. (Ohio State) Res. Asso., Life Ins. Sales. Res. Bur., Hartford, 
Conn. 17 S Cornwall St. 

Kury, Anita R. M.A. (Michigan) Analyst, Adm. Staff, W. P. A., 70 Columbus Ave., New 
York, N. Y. 188 Beach 68 St., Arverne, N. Y. 

Kwerel, ^ymour M. B.S. (C. C. N. Y.) Econ. Analyst and Stat., Bur. of For. & Dom. 
Commerce, Dept, of Commerce, Washington, D. C. 1666 Minford PL, Bronx, N. Y. 

Laderman, Jack 3224 Bronx Blvd., Bronx, N. Y. 

Lancaster, Asst. Prof, Otis E. Ph.D. (Harvard) Univ. of Maryland, College Park, Md. 

Lange, Asso. Prof. Oscar Ll.D. (Cracow) Univ. of Chicago, Chicago, 111. 

Larsen, Asst. Prof. Harold D. Ph.D. (Wisconsin) Univ. of New Mexico, Albuquerque, 
N. M. 

Leavens, Dickson H. M.A. (Yale) Res. Asso., Cowles Comm, for Res. in Economics, 
Univ. of Chicago, Chicago, 111. 

Le Leiko, Max B.S. (New York) Statistician, Williard Contracting Co., 221 West 57 St., 
New York, N. Y. 

Lemme, Asst. Prof. Maurice M. A.M. (Indiana) Univ. of Toledo, Toledo, Ohio 

Lengjrel, Bela A. Ph.D. (Pdzmdny) Instr., Rensselaer Poly. Inst., Troy, N. Y, 

Levin, Ida M.Sc. (Johns Hopkins) 118 Millard Hall, Univ. of Minnesota, Minneapolis, 
Minn. 

Levin, Pvt. Joseph H. Ph.D. (Chicago) U. S. Army, Coast Art. Sch. Detachment, Fort 
Monroe, Va., 12616 Broadstreet Blvd., Detroit, Mich. 

Livers, Asst. Prof. Joe J. M.A. (Washington State Coll.) Montana State Coll., Bozeman, 
Mont. 

Livesay, Naomi Ph.M. (Wisconsin) Rock. Found. Fellow, Univ. of Chicago, Chicago, 
111. 6010 Dorchester Ave. 

Lorge, Asso. Prof. Irving Ph.D. (Columbia) Columbia Univ., New York, N. Y. 

Lotka, Dr. A. J. Metropolitan Life Ins. Co., One Madison Ave., New York, N. Y.. 
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Lttkacit Eugene Ph.D. (Vienna) 5510 Pimlioo Bd., Ballimorey Md. 

Lundbexf, Prof. Oeofge A. Ph.D. (Minnesota) Bennington Coll., Bennington, Vt. 

McCarthy, Michael B. Univ. Coll., Cork, Ire. 

McCarthy, Philip J. A.M. (Princeton) Aset., Princeton Univ., Princeton, N. J. 

McDlarmid, Asst. Prof. Orville J. Ph.D. (Harvard) Carnegie Inst, of Tech., Pittsburgh, 
Pa. 

McSwen, Prof. G. F. Ph.D. (Stanford) Scripps Institution, La Jolla, Calif. 

Macphail, Asst. Prof. Moray St. J. Ph.D. (Oxford) Acadia Univ., Wolfville, N. S. 4 
Queenatan PL, Princeton, N, J, 

Madow, William G. Ph.D. (Columbia) Bur. of the Census, Washington, D. C. tUB 
Ogden St., NW, Washington, D. C.. 

Malzberg, Benjamin Ph.D. (Columbia) Sr. Statistician, N. Y. Dept, of Mental Hy- 
giene, Albany, N. Y. 

Mansfield, Prof. Ralph S.M. (Chicago) Chicago Teachers Coll., 6800 S. Stewart Ave., 
Chicago, m. 

Marschak, Ftof. Jakob Ph.D. (Heidelberg) New School for Social Res., 66 West 12 St., 
New York, N. Y. 

Marcuse, Mrs. Sophie M.A. (Columbia) B18 18 St., Santa Monica, Calif. 

Mauchly, John W. Ph.D. (Johns Hopkins) Instr., Univ. of Pennsylvania, Philadelphia, 
Pa. 
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New York. Aroian, Arrow, Bachelor, 
Bernstein, Bonis, Boschan, G. W. 
Brown, R. H. Brown, Burgess, Bushey, 
Chalmers, Court, Di Salvatore, Dodge, 
Elveback, Fertig, Fry, Gumbel, Haines, 
Head, Heidingsfield, Hoy, Hsu, Jablon, 
Juran, F. G. King, Konijn, Le Lciko, 
Lorge, Lotka, Marschak, Molina, Nor- 
ris, Paulsen, Payne, Peach, Peterson, 
Preinreich, Romig, Roodkowsky, Roos, 
Rubin, O. M. Smart, Soffer, Spiegelman, 
M. N. Torrey, Wald, II. M. Walker, 
Wallis, Welsh, Wiesenberg, Wilkinson, 
Zeiger, Zubin. 

Poughkeepsie. Hopper. 

Port W ashington . Kimbal 1 . 

Riverdale. Durand. 

Rochester. Dunlap. 

Rockville Centre. Berger. 

Schenectady . W areham . 

Staten Island. Wolfowitz. 

Troy. Lengyel. 

Woodhavbn. Munch. 

Yonkers. Ashcroft, Youden. 

New Mexico. (1) 

Albuquerque . Larsen . 

North Carolina. (6) 

Chapel Hill. Hagood. 

Durham. Greenwood. 

Paw Creek. Eiting. 

Raleigh. R. L. Anderson, Cell, O. M. Cox. 


Ohio. (11) 

ClNCTNNATI. Horst. 

Cleveland. Cowan, Van Voorhis. 
Columbus. L. E. Smart, Toops. 

Lorain. Hand. 

North Canton. Mummery, Schug. 
Oxford. Pollard. 

Toledo. Lemme. 

Welungton. Ruger. 

Oregon. (3) 

Corvallis. Hammer. 

Eugene. Kossack. 

Salem. Olshen. 

Pennsylvania. (24) 

Aliquippa. Schrock. 

Bethlehem. Passano. 

Bryn Mawr. Geiringer. 

Lewisburg. Benson, Richardson. 
Millerbville. Boyer. 

New Kensington. Johner. 

Oakmont. Petrie. 

OvBRBRooK Hii.ls. Watson. 
Philadelphia. Koopmans, Mauchly, 
Shohat. 

Pittsburgh. Blackburn, Calkins, Elkins, 
Hcbley, McDiarraid, Netzer, Niver, 
Olds, Savulak. 

State College. Graves, E. Johnson, 
Wagner. 

Philippine Islands. (3) 

Manila. Jaramillo, Mills, Toralballa. 

Rhode Island. (2) 
Providence. Bennett, Feller. 

South Carolina. (2) 

Clbmson- Upholt. 

Columbia. J. B. Coleman. 

Texas. (7) 

Austin. Dodd, Mood, Vickery, Villavaso. 
Dallas. Mouzon. 

Lubbock. Michie. 

Waco. Perry. 

Utah. (1) 

Salt Lake City. Woodbury. 
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VlBaXKXA. (11) 

Ablikgton. Bingham, Caine, Schultz 
Shelton, Simmons, Thom. 

Dahlqren. Dresch. 

Lexington. Royston. 

Lynchburg. Risley. 

Staunton. Owen. 

Vienna. Brandt. 

FOREIGN 

Argentina. (2) 

B ANFiELD . Acerboni . 

Buenos Aires. Barral-Souto. 

Brazil. (1) 

Rio de Janeiro. Kingston. 

Canada. (6) 

Chatham, Ontario. Beall, 

Edmonton, Alberta. Keeping, 

Kingston , Ontario . Edgett . 


Vermont. (1) 

Bennington. Lundberg. 

Washington. (2) 

Pullman. Vatnsdal. 

WooDiNViLLE. Anthony. 

Wisconsin. (7) 

Madison. Beal, Eisenhart, H. P. Evans, 
Fox, Ingraham, Ozanne. 

Milwaukee. Kenney. 

MEMBERS 

Toronto, Ontario. De Lury, R. W. B. 
Jackson, Wolfenden. 

China. (3)^ 

Shanghai. Chang, Shen, Wei. 

England. (2) 

Ilfracombe, Devon. Perryman. 
Manchester. Ross. 

Ireland. (1) 

Cork. M. D. McCarthy. 
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